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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 209/213 (98%) , Positives = 212/213 (99%) 

Query: 1 MTKEYEVEDMSKyAIVTGAGQGIGFAIAKRLHftDGFKIGVLDYNEETAQAAVDKLSPEDA 60 

+TK+YEVEDMSKVAIVTGAGQGIGFAIAKRLHADGFKIG+LDYNEETAQAAVDKLSPEDA 
Sbjct: 1 LTKKYEVEDMSKVAIOTGAGQGIGFAIAKRLHADGFKIGILDYNEETAQAAVDKLSPEDA 60 

Query: 61 VAWADVSKRDQVFDAFQKAATOTFGDLNVVVNNAGVAPTTPLDTITEEQFEKAFAINVGG 120 

VAWADVSKRDQVFDAFQKVVDTFGDLNVVVNNAGVAPTTPLDTITEEQFEKAFAINVGG 
Sbjct: 61 VAWADVSKRDQVFDAFQKWDTFGDLNVWNNAGVAPTTPLDTITEEQFEKAFAINVGG 120 

Query: 121 T1WGSQAAQKHFRELGHGGKIINATSQAGCEGNPNLTVYGGTKFATOGITQTLAKDLASE 180 

TIWGSQAAQKHFRELGHGGKIINATSQAGCEGNPNLTVYGGTKFAVRGITQTLAKDLASE 
Sbjct: 121 TIWGSQAAQKHFRELGHGGKIINATSQAGCEGNPNLTVYGGTKFAVRGITQTLAKDLASE 180 

Query: 181 GITVNAYAPGIVKTPMMFDIAHEVGKNAGKDDE 213 

GITVNAYAPGIVKTPMMF IAHEVGKNAGKDDE 
Sbjct: 181 GITVNAYAPGIVKTPMMFAIAHEVGKNAGKDDE 213 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1773 

A DNA sequence (GBSxl880) was identified in S.agalactiae <SEQ ID 5513> which encodes the amino 
acid sequence <SEQ ID 5514>. This protein is predicted to be ATP-dependent DNA helicase. Analysis of 
this protein sequence reveals the following: 

Possible site: 37 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3735 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB38451 GB:L47709 22.4% identity with Escherichia coli 

DNA-damage inducible protein . . . ; putative [Bacillus subtilis] 
Identities = 132/461 (28%) , Positives = 231/461 (49%) , Gaps = 22/461 (4%) 

Query: 21 RKYAWDLEATGAGPNAS - - 1 IQVGIVI IQGNKI IDSYETDVNPHESLDEHI VHLTGITD 78 

+++ V+D+E TG P IIQ+ V+I+ +1 + + +NP++S+ I LTGI++ 

Sbjct: 4 QRFWIDVETTGNSPKKGDKIIQIAAWIENGQITERFSKYINPNKSIPAFIEQLTGISN 63 

Query: 79 KQIAKAPDFGQVAHHIYQLIEDCI FVAHNVKFDANLLAEQLFLEGCELRTPRI -DTVELS 137 

+ + F VA ++QL++ FVAHN+ FD + +L G +L + DTVELS 

Sbjct: 64 QMVENEQPFEAVAEEVFQLLDGAYFVAHNIHFDI^FVKYEMKAGFQLPDCEVLDTVELS 123 

Query: 138 QVFYPCLEKYSLGALAESLNIELTDAHTAIADARATAQLFIKLKAKISSLPKEVLETILT 197 

++ +P E Y L L+E L + HA +DA T +F+++ K+ LP L+ + 

Sbjct: 124 RIVFPGFEGYKLTELSEELQLRHDQPHRADSDAEVTGLIFLEILEKLRQLPYPTLKQLRR 183 

Query: 198 FADNLLFESYLLIEEAYQEADFvNPKEYYFWQGLVLKKEKAVGKPKKLSSDFQ 250 

+ + + + L++ E Y + +++ +A+ +F 

Sbjct: 184 LSQHFISDLTHLLDMFINENRHTEIPGYTRFSSFSVREPEAIDVRINEDENFSFEIESWE 243 

Query: 251 VNMALLGMDARPKQWFADLVKAHFNDQTTTFLEAQPGLGKTYGYLLP--LLDQ 302 

++ + G + R Q++ V F ++ +EA PG+GKT GYL+P L + 
Sbjct: 244 AGNEKALSELMPGYEKRDGQMMMMREVADAFANREHALIEAPPGIGKTIGYLIPAALFAK 303 
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Query: 303 SQKQQIIVSVPTKILQDQIMAKEIKHIQELFHIPCHS--IKGPRNYLKLDAFYKSLQVQD 360 

K+ +I+S + +LQ QI+ K++ +Q+LF P + +KG +YL L F + L +D 
Sbjct: 304 KSKKPVIISTYSTLLQQQILTKDLPIVQDLFPFPVTAAILKGQSHYLCLYKFEQVLHEED 363 

5 

Query: 361 RNRLINRFKMQLLVWLTETTTGDLDEIKQKQRLESYFDQLKHDGE-VTQSSLFYDLDFWK 419 

N K QLLVWLTET TGD+ E+ + +D+L +D + +S + + F++ 

Sbjct: 364 DNYDAVLTKAQLLWLTETOT'GDVJffiL^PSGGKLLVroRLAYDDDSYKRSRSEHVIGFYE 423 

10 Query: 420 RSYDKVAQSQLVI INHAYFL- ERVQDDKDFAKGKVL VFDEA 459 

R+ +S LVI NH+ L + K + + DEA 

Sbjct: 424 RAKQIAMRSDLVITNHSLLLTDEGSHKKRLPESGTFIIDEA 464 
Identities = 63/195 (32%) , Positives = 88/195 (44%) , Gaps = 16/195 (8%) 

15 Query: 629 KVWIDTSMPNI LDLS PEQYAYE I AKRLQDIMTLKQPT - L VLLTSKQTMFMVSDYLDKWE I 687 
+V I M +1 D ++ + A+ ++ + KQP LVL TS + V E+ 
Sbjct: 720 QVMIPKEMKSIQDTGQPEFIQDTARYIELMAKEKQPKILVLFTSHDMLKKVHQ ED 774 

Query: 688 KH LTQD-KNGLAYNVKKRFDRGESNLLLGTGSFWEGVDFVHRDRLIEVITR 737 

20 KH LQG+KF +LLGT FWEGVDF + +1 R 

Sbjct: 775 KHNMSASGIQLJ^QGITGGSPGKLMKTFKTSNQAILLGTWIFWEGVDFPGDELTTVMIVR 834 

Query: 738 LPFDTPKDYFIQKLSQSLTKEGKNFFYDYSLPMTVLKLKQALGRTTRREEQKSAVIILDS 797 
LPF +P + K+GKN F SLP VL +Q +GR R K +IILD 

25 Sbjct: 835 LPFRSPDHPLHAAKCELARKKGKNPFQTVSLPEAVLTFRQGIGRLLRSAGDKGTIIILDR 894 

Query: 798 RLVIKSYGQTIMHSL 812 

R+ YG+ + +L 
Sbjct: 895 RIKTAGYGRLFLDAL 909 



30 



35 



40 



A related DNA sequence was identified in S.pyogenes <SEQ ID 5515> which encodes the amino acid 
sequence <SEQ ID 5516>. Analysis of this protein sequence reveals the following: 



Possible site: 37 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 3735 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 500/835 (59%) , Positives = 626/835 (74%) , Gaps = 2/835 (0%) 

Query: 1 MFCFIDIACYNRLTMTQKKLRKYAVVDLEATGAGPNASIIQVGIVIIOGNKIIDSYETDV 60 
45 MFCFIDIACYNRLTMTQKKLRKYAWDLEATGAGPNASIIQVGIVIIQGNKIIDSYETDV 

Sbjct: 1 MFCFIDIACYNRLTMTQKKLRKYAVVDLEATGAGPNASIIQVGIVIIQGNKIIDSYETDV 60 

Query: 61 NPHESLDEHIVHLTGITDKQLAKAPDFGQVAHHIYQLIEDCIFVAHNVKFDANLLAEQLF 120 
NPHESLDEHIVHLTGITDKQLAKAPDFGQVAHHIYQLIEDCIFVAHNVKFDANLLAE LF 
50 Sbjct: 61 NPHESLDEHIVHLTGITDKQLAKAPDFGQVAHHIYQLIEDCIFVAHNVKFDANLLAEALF 120 

Query: 121 LEGCELRTPRIDTVELSQVFYPCLEKYSLGALAESLNIELTDAHTAIADARATAQLFIKL 180 

LEG ED PR+DTVEL+Q+F+P EKY+D L+ LNI+L +AHTAIADARATA LF++L 
Sbjct: 121 LEGYELTIPRVDTVELAQLFFPRFEKYNLSHLSRQLNIDLAEAHTAIADARATAILFLRL 180 

55 

Query: 181 KRKISSLPKEVLETILTFADNLLFESYLLIEEAYQEADFVNPKEYYFWQGLVLKKEKAVG 240 

KI SLP E LE++L ++D+LLFE+ ++I+E +A +P +Y + ++L K 
Sbjct: 181 LQKIESLPIECLESLLVYSDSLLFETAMvTQEGLAKAKPYDPNKYIKIRQILLPKGSKAL 240 

60 Query: 241 KPKKLSSDFQVNMALLGTOARPKQVVFADLVICftHFNDQTTTFLEAQPGLGKTYGYLLPLL 300 

KP ++S F +NMALLG++ RPKQ FA L+ ++ +F+EAQ G+GKTYGYLLPLL 
Sbjct: 241 KPYQISKSFPINMALLGLEERPKQTQFAQLIDEDYHCGV7ASFIEAQTGIGKTYGYLLPLL 300 

Query: 301 DQSQKQQIIVSVPTKILQDQIMAKEIKHIQELFHIPCHSIKGPRNYLKLDAFYKSLQVQD 360 
65 + + QI IVSVPTK+LQDQ+MA E+ IQE FHI CHS+KGP NYLKLD+F SL D 
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Sbjct: 301 AKEDQNQIIVSVPTKLLQDQLMAGEVAAIQEQFHIAOISLKGPANYLKLDSFADSLDQND 360 

Query: 361 RNRLINRFKMQLLVWLTETTTGDLDEIKQKQRLESYFDQLKHDGEVTQSSLFYDLDFWKR 420 
+jNRL+NR+KMQLLVWL ET TGDLDEIKQKQR +YF+QLKHDG++ QSS FYD DFW+ 
5 Sbjct: 361 QNRLWRYKMQLLVWLLETKTGDLDEIKQKQRFAAYFEQLKHDGDIKQSSEFYDYDFWRV 420 

Query: 421 SYDKVAQSQLVIINHAYFLERVQDDKDFAKGKVLVFDEAQKLVLGLENFSRGQLDISHQL 480 

SY+K ++L+I NHAYFL RVQDDKDFA+ KVLVFDEAQKL+L L+ SR QL+++ L 
Sbjct: 421 SYEKAKTARLLITJnTHAYFLHRVQDDKDFJU?NKVLVFDEAQKLMLQLDQLSRHQ1MLTVFL 480 

10 

Query: 481 QVI QKI IDS S I PLLQKRLLES I S YELSHAVELFYRHNSFEFSETWLKRLKNS INALE WG 540 

Q IQ + + +PLL+KRLLES+S+EL +Y++ + + W R+ L 

Sbjct: 481 QTIQAKLSNPLPLLEKRLLESLSFELGQVSSDYYQNKEHQLAHDW-SRIAGYAKELTGAD 539 

15 Query: 541 LDELQTFFTATYTNYWFETDKVNEKRLTILRGAREDFLKFSKFLPPTKKTYMISATLQIS 600 

ELQ FF + +YW ++K EKR+T L A + F+ F + LP T KTY +SATL IS 
Sbjct: 540 YQELQAFFATSDGDYWLSSEKQEEKRVTYLNSASKAFIHFQQLLPETVKTYFVSATLTIS 599 

Query: 601 PKVYLSDLLGGFSSISTEKIAHEKNANQKVWIDTSMPNILDLSPEQYAYEIAKRLQDIMT 660 
20 +V L+DLL GF I +K +Q V +D P + ++S + Y IAKR++ + 

Sbjct: 600 SEVTLADLL-GFEEYLYHVIEKDKKQDQLVLVDQEAPIVTEVSDQIYVEAIAKRIESLKQ 658 

Query: 661 LKQPTLVLLTSKQTMFMVSDYLDKWEIKHLTQDKNGIAYNVKKRFDRGESNLLLGTGSFW 720 
P LVL SK+ + +VSDYLD+W++ HL Q+KNG AYN+ KKRFD+GE +LLG GSFW 
25 Sbjct: 659 EGYP ILVLFNS KKHLLLVSDYLDQWQVPHLAQEKNGTAYNI KKRFDQGEQTI LLGLGS FW 718 

Query: 721 EGVDFVHRDRLIEVITRLPFDTPKDYFIQKLSQSLTKEGKNFFYDYSLPMTVLKLKQALG 780 

EGVDF+ DR+I +1 RLPFD P+D+F++K+S L ++GKN F DY LPMT+L+LKQA+G 
Sbjct: 719 EGVDFIQADRMITLIARLPFDNPEDFFVKKMSHYLLEKGKNPFRDYFLPMTIBRLKQAIG 778 

30 

Query: 781 RTTRREEQKSAVIILDSRLVIKSYGQTIMHSLGRDFEISKEKINKVLTEMAKFLI 835 

RT RR++QKS VIILD RL+ KSYGQ 1+ LG++F IS++ + L E FLI 
Sbjct: 779 RTMRRQDQKSWIILDRRLLTKSYGQVILEGLGQEFLISQQNFHDCLVETDCFLI 833 

35 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1774 

A DNA sequence (GBSxl881) was identified in S.agalactiae <SEQ ID 5517> which encodes the amino 
acid sequence <SEQ ID 5518>. Analysis of this protein sequence reveals the following: 

40 Possible site: 27 

»> Seems to have no N-terrainal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2042 (Affirmative) < suco 

45 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9633> which encodes amino acid sequence <SEQ ID 9634> 
was also identified. 

50 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF12702 GB:AF035157 aspartate aminotransferase [Lactococcus 
lactis] 

Identities = 270/391 (69%), Positives = 314/391 (80%) 

55 Query: 7 MTYLSERvLNMEESOTLAAGAKARELRVQGRDILSLTLGEPDFATPKNIQQAAIEAITDG 66 

M S+ VL M+ESVTIAA +A+ L+ QGRDI+ LTLG+PDF TPK I QARIEAI +G 
Sbjct: 1 MKKCSDFvLK^ESVTIiAAANRAKALKAQGRDIIDLTLGQPDFPTPKKIGQAAIEAlNNG 60 



60 



Query: 



67 



RASFYTPSSGLPELKSAINAYFERFYGYSLKPNQVWGTGAKFILYTFFMTVLNPGDEVI 12 6 
+ASFYT + GLPELK A+ Y+ RFY Y ++ N++++ GAKF LY +FM ++P DEVI 
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Sb j ct : 



61 



QASFYTQAGGLPELKKAVQHYWTRFYAYEIQTNEILITAGAKFALYAYFMATVDPLDEVI 



120 



Query: 127 IPTPYWVSYADQIKMAEGKPVFVTAKEVOTFKVIVEQLEAVRTDKTKVILLNSPSNPTGM 186 

IP PYWVSY DQ+KMA G PV V AK+ N+FKVTVEQLE RT KTK+ +LLNS PSNPTGM 
Sbjct: 121 IPAPYWSYVDQVKMAGGNPVIVEAKQEKNFKVTVEQLEKARTSKTKILLLNSPSNPTGM 180 

Query: 187 IYKAEELEAIGNWAVEHDILILADDIYGRLVYNGNIFTPISSLSESIRNQTIVINGVSKT 246 

IY EEL AIG WAV HD+LILADDIY RLVYNG FT ISSLS+ IRN+T VINGVSKT 
Sbjct: 181 IYSKEELTAIGEWAVAHDLLILADDIYHRLVYNGAEFTAISSLSDEIRNRTTVINGVSKT 240 

Query: 247 YAMTGWRVGFAVGNHDIIAAMSKWSQTTSNLTAVSQYATIEALNGSQESFEKMRLAFEE 306 

+AMTGWR+G AVG+ +IIAAM+K+ SQTTSN TAV+QYA IEA + +SFEKM AFEE 
Sbjct: 241 FAMTGWRIGLAVGDPEIIAAMTKIASQTTSNPTAVAQYAAIEAFEENDKSFEKMHAAFEE 300 

Query: 307 RLNIIYPLLCQVPGFEWKPQGAFYLFPNVTKAMEMKGYTDVTAFTDAILEEVGLALVTG 366 

RLN IY L +VPGFE+VKP GAFYLFP VTKAM MKGYTDVT FT AILEE G+ALVTG 
Sbjct: 301 RLNKIYLQLSEVPGFELVKPNGAFYLFPKA/TKAMAMKGYTDVTDFTTAILEEAGVALVTG 360 

Query: 367 AGFGAPENVRLSYATDLETLKEAVRRLHVFM 397 

AGFG+PENVRLSYAT LETL+ AV RL +M 
Sbjct: 361 AGFGSPENVRLSYATSLETLEAAVTRLKDWM 391 

A related DNA sequence was identified in S. pyogenes <SEQ ID 1005> which encodes the amino 
sequence <SEQ ID 1006>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -0.48 Transmembrane 95 - 111 ( 95 - 113) 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 301/397 (75%) , Positives = 343/397 (85%) 

Query: 7 MTYLSERVMMEESVTLAAGAKARELRVQGRDILSLTLGEPDFATPKNIQQAAIEAITDG 66 

M LS+RVL M+ESVTLAAGA+A+ L+ QGRD+L+LTLGEPDF TPK+IQ AIE+I +G 
Sbjct: 1 MPKLSKRVLEMKESOTLAAGARAKALKAQGRDVLNLTLGEPDFFTPKHIQDKAIESIQNG 60 

Query: 67 RASFYTPSSGLPELKSAINAYFERFYGYSLKPNQVWGTGAKFILYTFFMTVLNPGDEVI 126 

ASFYT +SGLPELK+AI Y + YGY L P+Q+V GTGAKFILY FFM VLNPGD+V+ 
Sbjct: 61 TASFYTNASGLPELKAAIATYLKNQYGYHLSPDQIVAGTGAKFILYAFFMAVLNPGDQVL 120 

Query: 127 IPTPYWSYADQIKMAEGKPVFVTAKEVTfflFKVTVEQLEAVRTDKTKVILLNSPSNPTGM 186 

IPTPYWVSY+DQ+KMAEG+P+FV E N FKVTV+QLE RT KTKV+L+NSPSNPTGM 
Sbjct: 121 IPTPYWVSYSDQVKMAEGQPIFVQGLEENQFKVTVDQLERARTSKTKVVLINSPSNPTGM 180 

Query: 187 IYKAEELEAIGNWAVEHDILILADDIYGRLVYNGNIFTPISSLSESIRNQTIVINGVSKT 246 

IY AEEL AIG WAV +DILILADDIYG LVYNGN F PIS+LSE+IR QTI +NGV+K+ 
Sbjct: 181 IYGAEELRAIGEWAVHNDILILADDIYGSLVYNGNQFVPISTLSEAIRRQTITVNGVAKS 240 

Query: 247 YAMTGWRVGFAVGNHDIIAAMSKWSQTTSNLTAVSQYATIFJiLNGSQESFEKMRLAFEE 306 

YAMTGWRVGFA G +II+AMSK++ QTTSNLT VSQYA IEA GSQ S E+MRLAFEE 
Sbjct: 241 YAMTGWRVGFAAGEPEI ISAMSKI IGQTTSNLTTVSQYAAIEAFCGSQSSLEEMRLAFEE 300 

Query: 307 RIjNIIYPLLCQVPGFEVvICPQGAFYLFPNVTKAMEMKGYTDVTAFTDAILEEVGLALVTG 366 

RLNI YPLLCQVPGFEWKPQGAFY FPNV KAMEM G++DVT+F +AI LEEVGLA+V+G 
Sbjct: 301 RIjNITYPLLCQVPGFEVVTCPQ/^FYFFPNVKKArffiMTGFSDVTSFANAILEEVGLAVVSG 360 

Query: 367 AGFGAPENVRLSYATDLETLKEAVRRLHVFMGSNEIN 403 

AGFGAPENVRLSYATD+ETLKEAVRRLHVFM SNEIN 
Sbjct: 361 AGFGAPENVRLSYATDIETLKEAVRRLHVFMESNEIN 397 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=o. 1192 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1775 

A DNA sequence (GBSxl882) was identified in S.agalactiae <SEQ ID 5519> which encodes the amino 
acid sequence <SEQ ID 5520>. This protein is predicted to be asparaginyl-tRNA synthetase (asnS). 
Analysis of this protein sequence reveals the following: 

Possible site: 46 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1488 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB05415 GB:AP001512 asparaginyl-tRNA synthetase [Bacillus halodurans] 
Identities = 252/442 (57%) , Positives = 316/442 (71%) , Gaps = 15/442 (3%) 



Query: 


7 


SIVDVKDYVGQEVTIGAWVANKSGKGKIAFVQLRDGSAFFQGVAFKPNFIEKYGEESGLE 


66 






+ 1 + YV QEVT+GAW+ANK GKIAF+QLRDG+ F QGV K E G E 




Sbjct: 


4 


TIAKIGQYVDQEVTLGAWLANKRSSGKIAFLQLRDGTGFIQGVWKA EVGDE 


55 


Query: 


67 


KFDVIKRLNQETSVYVTGIVKEDERSKFGYELDITDLEVIGESHEYPITPKEHGTDFLMD 


126 






F KIi QE+S+YVTGIV++DER+ GYEL +T ++I E+ +YPITPKEHGT+FLMD 




Sbjct: 


56 


WFQKAKNLTQESSLYVTGITOKDERAPSGYELTVTSFDIIHEATDYPITPKEHGTEFLMD 


115 


Query: 


127 


NRHLWLRSRKQMAvMQIRNAIIYSTYEFFDQNGFIKFDSPILSENAftEDSTELFETDYFG 


186 






+RHLW+RSRKQ AV++IRN II +TYEFF +NGF+K D PIL+ +A E +TELF T YF 




Sb j ct : 


116 


HRHLWIRSRKQHAvLRIRNEIIRATYEFFHENGFVKVDPPILTGSAPEGTTELFHTKYFD 


175 


Query: 


187 


KPAFLSQSGQLYLEAGAMALGRVFDFGPVFRAEKSKTRRHLTEFWMMDAEYSFLSHEESL 


246 






+ AFLSQSGQLY+EA A+A GRVF FGP FRAE KS KTRRHL EFWM++ E +F+ EESL 




Sbjct: 


176 


EDAFLSQSGQLYMEAAAIAFGRVFSFGPTFRAEKSKTRRHLIEFWMIEPEMAFVEFEESL 


235 


Query: 


247 


DLQEAYVT^ALIQGVLDRAPQALDILERDVEALKRYIAEPFKRVSYDDAITLLQEHEADED 


306 






++QE YV ++Q VL L L RD L+ I PF R+SYDDAI L E D+ 




Sb j ct : 


236 


E I QENYVAYI VQSVLKHCAI ELKTLGRDTSVLES - 1 QAPFPRI S YDDAI KFLHEKGFDD - 


293 


Query: 


307 


TDYEHLEHGDDFGSPHETWISNYFGVPTFWNYPASFKAFYMKPVPGNPERVLCADLLAP 


366 






+E GDDFG+PHET 1+ +F P F+ +YP S K FYM+P P + VLCADL+AP 




Sb j ct : 


294 


IEWGDDFGAPHETAIAEHFDKPVFITHYPTSIiKPFYMEPDPNRDDWLCADLIAP 


348 


Query: 


367 


EGYGEIIGGSMREDDYDALVAKMDELGMDKSEYDFYLDLRKYGSVPHGGFGIGIERMVTF 


426 






EGYGEIIGGS R DYD L +++E + Y +YLDLRKYGSVPH GFG+G+ER V + 




Sb j ct : 


349 


EGYGEIIGGSQRISDYDLLKKRLEEHDLSLDAYAWYLDLRKYGSVPHSGFGLGLERTVGW 


408 


Query: 


427 


VAGTKHIREAIPFPRMLHRIKP 448 








++G H+RE IPFPR+L+R+ P 




Sbjct: 


409 


I SGAGHVRETI PFPRLLNRLYP 430 





A related DNA sequence was identified in S.pyogenes <SEQ ID 5521> which encodes the amino acid 
sequence <SEQ ID 5522>. Analysis of this protein sequence reveals the following: 

Possible site: 46 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1488 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 443/448 (98%) , Positives = 447/448 (98%) 

Query: 1 MSKKLISIVDVKDYVGQEVTIGAWVANKSGKGKIAFVQLRDGSAFFQGVAFKPNFIEKyG 60 

MSKKLISIVDVKDYVGQEVTIGAWVANKSGKGKIAFVQLRDGSAFFQGVAFKPNFIEKYG 
Sbjct: 1 MSKKLISIVDVKDYVGQEVTIGAWVANKSGKGKIAFVQLRDGSAFFQGVAFKPNFIEKYG 60 

Query: 61 EESGLEKFDVIKRLNQETSVYVTGIVKEDERSKFGYELDITDLEVIGESHEYPITPKEHG 120 

EESGLEKFDVIKRLNQETSVYVTGIVKEDERSKFGYELDITDLE+IGESHEYPITPKEHG 
Sbjct: 61 EESGLEKFDVIKRLNQETSVYVTGIVKEDERSKFGYELDITDLEIIGESHEYPITPKEHG 120 

Query: 121 TDFLMDNRHLWLRSRKQMAVMQIRNAIIYSTYEFFDQNGFIKFDSPILSENAAEDSTELF 180 

TDFLMDNRHLWLRSRKQMAVMQIRNAIIY+TYEFFDQNGFIKFDSPILSENAAEDSTELF 
Sbjct: 121 TDFLMDNRHLWLRSRKQMAvMQIRNAIIYATYEFFDQNGFIKFDSPILSENAAEDSTELF 180 

Query: 181 ETDYFGKPAFLSQSGQLYLFAGAMALGRVFDFGPVFRAEKSKTRRHLTEFWMMDAEYSFL 240 

ETDYFGKPAFLSQSGQLYLEAGAMALGRVFDFGPVFRAEKSKTRRHLTEFWMMDAEYSFL 
Sbjct: 181 ETDYFGKPAFLSQSGQLYLEAGAMALGRVFDFGPVFRAEKSKTRRHLTEFWMMDAEYSFL 240 

Query: 241 SHEESLDLQEAYVKALIQGVLDRAPQALDILERDVEALKRYIAEPFKRVSYDDAITLLQE 300 

SHEESLDLQEAYVKALIQGVLDRAPQALDILERDVEALKRYI EPFKRVSYDDAITLLQE 
Sbjct: 241 SHEESLDLQEAYVKALIQGVLDRAPQALDILERDVEALKRYITEPFKRVSYDDAITLLQE 300 

Query: 301 HEADEDTDYEHLEHGDDFGSPHETWISNYFGVPTFWNYPASFKAFYMKPVPGNPERVLC 360 

HEADEDTDYEHLEHGDDFGSPHETWISNYFGVPTFWNYPASFKAFYMKPVPGNPERVLC 
Sbjct: 301 HEADEDTDYEHLEHGDDFGSPHETWISNYFGVPTFWNYPASFKAFYMKPVPGNPERVLC 360 

Query: 361 ADLLAPEGYGEIIGGSMREDDYDALVAKMDELGMDKSEYDFYLDLRKYGSVPHGGFGIGI 420 

ADLLAPEGYGEIIGGSMRED+YDALVAKMDELGMDKSEYDFYLDLRKYGSVPHGGFGIGI 
Sbjct: 361 ADLIAPEGYGEIIGGSMREDNYDALVAKMDELGMDKSEYDFYLDLRKYGSVPHGGFGIGI 420 

Query: 421 ERMVTFVAGTKHIREAIPFPRMLHRIKP 448 

ERMVTFVAGTKHIREAIPFPRMLHRI+P 
Sbjct: 421 ERMVTFVRGTKHIREAIPFPRMLHRIRP 448 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1776 

A DNA sequence (GBSxl883) was identified in S.agalactiae <SEQ ID 5523> which encodes the amino 
acid sequence <SEQ ID 5524>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -6.85 Transmembrane 103 - 119 ( 102 - 127) 

INTEGRAL Likelihood = -5.04 Transmembrane 73 - 89 ( 68 - 93) 

INTEGRAL Likelihood = -4.19 Transmembrane 31 - 47 ( 31 - 49) 

INTEGRAL Likelihood = -1.86 Transmembrane 157 - 173 ( 157 - 173) 



Final Results 

bacterial membrane Certainty=0 . 3739 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD40355 GB:AF036485 hypothetical protein [Plasmid pNZ4000] 
Identities = 39/135 (28%) , Positives = 72/135 (52%) , Gaps = 4/135 (2%) 

Query: 3 KSPARLISFISIAIAINLVGANLALFLRLPIYLDTIGTLLIAVILGPWYAASTAFLSALI 62 

K A ++ I A+ IN V LA L+LP++L ++GT L +++ GP A + F++ +1 
Sbjct: 15 KLSAATMTLIPAAVGINWAKALAEGLKLPVWLGSLGTFLASMLAGPVAGAISGFINNVI 74 



Query: 63 NWMTTDIFSLYYSPVAIWAIITGILIKRNCKPSS--LLWKSLIISLPGTIIASVITVIL 120 
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+T S Y+ +1 + I G+L S+ + ++II++ +I++ + VI 

Sbjct: 75 YGLTLSPISTVYAITSIGIGIAVGVLHMIGWFSSARRVFVSAIIIAIVSAVISTPLNVIF 134 

Query: 121 FKGIT--SSGSSIIA 133 
5 +GT+GS+A 

Sbjct: 135 WGGQTGIAWGDSLFA 149 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
10 vaccines or diagnostics. 

Example 1777 

A DNA sequence (GBSxl884) was identified in S.agalactiae <SEQ ID 5525> which encodes the amino 
acid sequence <SEQ ID 5526>. Analysis of this protein sequence reveals the following: 

Possible site: 25 
15 >» Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1873 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



25 



>GP:AAC75223 GB:AE000305 orf, hypothetical protein [Escherichia coli K12] 
Identities = 97/305 (31%) , Positives = 160/305 (51%) , Gaps = 10/305 (3%) 

Query: 1 MNKEKI I IDCDPGIDDTLALMYAIQHPKLEWAITITAGNSPVELGLKNTFVTLELLNRH 60 

M K KII+DCDPG DD +A+M A +HP ++++ ITI AGN ++ L N + L 

Sbjct: 1 MEKRKIIIlDCDPGHDDAlAI^lMAAKHPAIDLLGITIVACMQTLDKTLINGI J NVCQKL-EI 59 

30 Query: 61 DI PVYVGDNLPLQREFVSAQDTHGMDGLGENNFTLAQPI I FQEESADC F1ANYFEHK 117 

++PVY G P+ R+ + A + HG GL F +P+ Q ES + 
Sbjct: 60 NVPVYAGMPQPIMRQQIVADNIHGETGLDGPVF EPLTRQAESTHAVKYI IDTLMASD 116 

Query: 118 NDTSIIALGPLTNIARALQTNPKLGKHCKRFISMGGSFKSHGNCSPVAEYNYWCDPHAAQ 177 
35 D +++ +GPL+NIA A++ P + + + MGG++ + GN +P AE+N + DP AA+ 

Sbjct: 117 GDITLVPVGPLSNIAVAMRMQPAILPKIREIVLMGGAYGT-GNFTPSAEFNIFADPEAAR 175 

Query: 178 YVFENLDKKIEMVGLDITRHIVLTPNHLSYMERINPDVSSFIQKITKFYFDFHWQYEHII 237 
VF + + M4-GLD+T V TP+ ++ MER IF ++ + 

40 Sbjct: 176 WFTS-GVPLVMMGLDLTNQTVCTPDVIARMERAGGPAGELFSDIMNFTLKTQFENYGLA 234 

Query: 238 GCVINDPLAIAYFVNENIATGFDSYTDVACH-GIAMGQTIVDQYHFYKKDANSKILTSVN 296 

G ++D IY+N+ +Y+V+G G+T+ D+ K AN+K+ +++ 

Sbjct: 235 GGPVHDATCIGYLINPDGIKTQEMYVEVDVNSGPCYGRTVCDELGVLGKPANTKVGITID 294 



45 



Query: 297 TNLFW 301 

T+ FW 
Sbjct: 295 TDWFW 299 



50 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1778 

A DNA sequence (GBSxl885) was identified in S.agalactiae <SEQ ID 5527> which encodes the amino 
55 acid sequence <SEQ ID 5528>. Analysis of this protein sequence reveals the following: 
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Possible site: 53 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1860 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB62728 GB:AL133423 hypothetical protein SC4A7.24C 
[Streptomyces coelicolor A3 (2) ] 
Identities = 36/134 (26%) , Positives = 57/134 (41%) , Gaps = 7/134 (5%) 

Query: 1 MLYEVTSSNTQGVDGKVYLSNGKIVETNHPLNHL PGFNPEELIALAWSTCUJATIK 56 

+LY ++ G DG+V +G++ +P + G NPE+L A +S C + 

Sbjct: 8 VLYTAVATAENGRDGRVATDDGRLDWVNPPKEMGGNGAG'TISIPEQLFAAGYSACFQGALG 67 

Query: 57 AILEQKGFKDLKSRVDVTCQLMKEKQVGKGFYFQVNAVASIEKLSLSDSKLIVNKAHSRC 116 

+ Q+G S V +K GFVAI+++++V KAH C 

Sbjct: 68 WARQEGADISGSTVTAKVGIGKNDD GFGIIvEISAEIPTvDAATARSLVEKAHQVC 124 

Query: 117 PISKLISNAKTINL 130 

P SK T+ L 

Sbjct: 125 PYSKATRGNITVTL 138 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1779 

A DNA sequence (GBSxl886) was identified in S.agalactiae <SEQ ID 5529> which encodes the amino 
acid sequence <SEQ ID 5530>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 0531 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9635> which encodes amino acid sequence <SEQ ID 9636> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15482 GB:Z99121 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 164/285 (57%) , Positives = 207/285 (72%) , Gaps = 2/285 (0%) 

Query: 6 IKLVIVTGMSGAGKTVAIQSFEDLGYFTIDNMPPTIjVPKFLELAAQSGDT-SKIAMVVDM 64 

I+LVI+TGMSGAGKTVAIQSFEDLGYF +DN+PP+L+PKFLEL +S SK+A+V+D+ 
Sbjct: 9 IQLVIITGMSGAGKTVAIQSFEDLGYFCVDNLPPSLLPKFLELMKESNSKMSKVALVMDL 68 

Query: 65 RSRLFFREINSILDSLEINDNINFKILFLDATDTELVSRYKETRRSHPLAADGRVLDGIS 124 

R R FF + LD + N I +ILFLDA D+ IiV+RYKETRRSHPIAA G L+GI+ 
Sbjct: 69 RGREFFDRLIFJ^DEMAENPWITPRILFLnAKDSILVTRYKETRRSHPLAATGLPLEGIA 128 

Query: 125 LERELLAPLKSMSQNWDTSELTPRQLRKVISKEFSNQDSQSSFRIEVMSFGFKYGIPLD 184 

LERELL LK SQ + DTS++ PR LR+ I K F+ ++ F + VMSFGFKYGIP+D 
Sbjct: 129 LERELLEELKGRSQIIYDTSDMKPRDLREKIVKHFATNQGET-FTVNVMSFGFKYGIPID 187 



Query: 185 ADLVFDVRFIiPNPYYKPELRDKTGLDTEVYDYVMSFDESDDFYDHLLALIKPILPGYQNE 244 
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ADLVFDVRFLPNPYY +R TG D EV YVM ++E+ F + L+ L+ +LP Y+ E 
Sbjct: 188 ADLVFDWFLPNPYYIESMRPLTGKDKEVSSYVMKWNETQKFNEKLIDLLSFMLPSYKRE 247 

Query: 245 GKSVLTVAIGCTGGQHRSTAFAHRBSEDLKADWTVNESHRDKNKR 289 

GKS + +AIGCTGGQHRS A L++ K D+ + +HRD KR 
Sbjct: 248 GKSQWIAIGCTGGQHRSVTLAENLADYFKKDYYTIWTHRDIEKR 292 

A related DNA sequence was identified in S.pyogenes <SEQ ID 553 1> which encodes the amino 
sequence <SEQ ID 5532>. Analysis of this protein sequence reveals the following: 

Possible site: 20 
>>> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB15482 GB:Z99121 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 164/291 (56%) , Positives = 213/291 (72%) , Gaps = 3/291 (1%) 



Query: 


1 


MSDKH-INLVIVTGMSGAGKTVAIQSFEDLGYFTIDNMPPALVPKFLELIEQTNENR-RV 


58 






+S+ H I LVI+TGMSGAGKTVAIQSFEDLGYF +DN+PP+L+PKFLEL++++N +V 




Sb j ct : 


3 


VSESHDIQLVIITGMSGAGKTVAIQSFEDLGYFCVDNLPPSLLPKFLELMKESNSKMSKV 


62 


Query: 


59 


ALWDMRSRLFFKE INSTLDS IESNPS IDFRI LFLDATDGELVSRYKETRRSHPLAADGR 


118 






ALV+D+R R FF + LD + NP I RILFLDA D LV+RYKETRRSHPLAA G 




Sb j ct : 


63 


ALVMDLRGREFFDRLIEALDEMAENPWITPRILFLDAKDSILVTRYKETRRSHPLAATGL 


122 


Query: 


119 


VLDGIRLERELLSPLKSMSQHWDTTKLTPRQLRKTISDQFSEGSNQASFRIEVMSFGFK 


178 






L+GI LERELL LK SQ + DT+ + PR LR+ I F+ + +F + VMSFGFK 




Sb j ct : 


123 


PLEGIALERELLEELKGRSQI I YDTSDMKPRDLREKI VKHFATNQGE - TFTVNVMS FGFK 


181 


Query: 


179 


YGLPLDADLVFDWFLPNPYYQVELREKTGLDEDVFNYVMSHPESEVFYKHLLNLIVPIL 


238 






YG+P+DADLVFDVRFLPNPYY +R TG D++V +YVM E++ F + L++L+ +L 




Sb j ct : 


182 


YGIPIDADLVFDWFLPNPYYIESMRPLTGKDKEVSSYVMKMNETQKFNEKLIDLLSFML 


241 


Query: 


239 


PAYQKEGKS VLTVAIGCTGGQHRSVAFAHCLAESLATDWSVNESHRDQNRR 289 








P+Y++EGKS + +AIGCTGGQHRSV A LA+ D+ + +HRD +R 




Sb j ct : 


242 


PSYKREGKSQWIAIGCTGGQHRSOTIAENLADYFKKDYYTHVTHRDIEKR 292 





An alignment of the GAS and GBS proteins is shown below. 

Identities = 234/296 (79%) , Positives = 263/296 (88%) 



Query: 


1 


MSDEQIKLVIVTGMSGAGKTVAIQSFEDLGYFTIDNMPPTLVPKFLELAAQSGDTSKIAM 


60 






MSD+ I LVIVTGMSGAGKTVAIQSFEDLGYFTIDNMPP LVPKFLEL Q+ + ++A+ 




Sb j ct : 


1 


MSDKHINLVIVTGMSGAGKTVAIQSFEDLGYFTIDNMPPALVPKFLELIEQTNENRRVAL 


60 


Query: 


61 


VVDMRSRLFFREINSILDSLEINDNINFKILFlaDATDTELVSRYKETRRSHPLAADGRVL 


120 






WDMRSRLFF+EINS LDS+E N +I+F+ILFLDATD ELVSRYKETRRSHPLAADGRVL 




Sb j ct : 


61 


WDMRSRLFFKEINSTLDSIESNPSIDFRILFIJDATDGELVSRYKETRRSHPLAADGRVL 


120 


Query: 


121 


DGISLERELLAPLKSMSQNWDTSELTPRQLRKVISKEFSNQDSQSSFRIEVMSFGFKYG 


180 






DGI LERELL+PLKSMSQ+WDT++LTPRQLRK IS +FS +Q+SFRIEVMSFGFKYG 




Sb j ct : 


121 


DGIRLERELLSPLKSMSQHWDTTKLTPRQLRKTISDQFSEGSNQASFRIEVMSFGFKYG 


180 


Query: 


181 


IPLDADLVFDVRFLPNPYYKPELRDICrGLDTEVYDYVMSFDESDDFYDHLLALIKPILPG 


240 






+PLDADLVFDVRFLPNPYY+ ELR+KTGLD +V++YVMS ES+ FY HLL LI PILP 




Sb j ct : 


181 


LPLDADLWDTOFLPNPYYQVELREKTGLDEDVFTOVMSHPESEVFYKHLLNLIVPILPA 


240 


Query: 


241 


YQNEGKS VLTVAIGCTGGQffi^STAFAHRLSEDLKADWTVNESHRDKNKRKETVNRS 296 








YQ EGKSVLTVAIGCTGGQHRS AFAH L+E L DW+VNESHRD+N+RKETVNRS 




Sb j ct : 


241 


YQKEGKSVLWAIGCTGGQHRSVAFAHOjAESIATDWSVNESHRDQNRRKETVNRS 296 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1780 

A DNA sequence (GBSxl887) was identified in S.agalactiae <SEQ ID 5533> which encodes the amino 
5 acid sequence <SEQ ID 5534>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

»> Seems to have an uncleavable N-term signal seq 

Final Results 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

15 >GP:CAB96620 GB:AJ400630 hypothetical protein [Streptococcus pneumoniae bacteriophage MM1] 

Identities = 254/321 (79%) , Positives = 286/321 (88%) , Gaps = 1/321 (0%) 

Query: 1 MRKPKITVIGGGTGIPVILKSLRLEDVEITAVVTVADDGGSSGELRSVMQ-LTPPGDLRN 59 
MRKPKITVIGGGTGIPVILKSLR +DVEI A+VTVADDGGS SGELR MQ LTPPGDLRN 
20 Sbjct: 1 MRKPKIWIGGGTGIPVILKSLREKDVEIAA.IVTVADDGGSSGELRKNMQQLTPPGDLRN 60 

Query: 60 VLVALSDMPKFYEQI FQYRFAEGDGDFAGHPLGNLI IAGVAEMQGSTYNAMQSLTQFFHT 119 

VLVA+SDMPKFYE++FQYRF+E G FAGHPLGNLIIAG++EMQGSTYNAMQ L++FFHT 
Sbjct: 61 VLVAMSDMPKFYEKVFQYRFSEDAGAFAGHPLGNLIIAGLSEMQGSTYNAMQLLSKFFHT 120 

25 

Query: 120 TGKIYPSSEHPLTLHAVFKIX3HEVVGESQIM3YKGMlbHVYVTNTyNEETPTASRKVVDA 179 

TGKIYPSS+HPLTLHAVF+DG EV GES I D++G+ID+VYVTN N++TP ASR+W 
Sbjct: 121 TGKIYPSSDHPLTLHAVFQDGTEVftGESHIVDHRGIIDNVYVTORLNDDTPIASRRVVQT 180 

30 Query: 180 ILESDMIVLGPGSLFTSILPNLVIPEIRQALLETRAEVAYVCN1MTQRGETEHFTDADHV 239 

ILESDMIVLGPGSLFTSILPN+VI EI +ALLET+AE+AYVCNIMTQRGETEHFTD+DHV 
Sbjct: 181 ILESDMIVLGPGSLFTSILPNIVIKEIGRALLETKAEIAYVCNIMTQRGETEHFTDSDHV 240 

Query: 240 EVLKRHLGQDAIDTVLVNIEKVPESYMENNHFDEYLVQVEHDFSGLRKHARRVISSNFLK 299 
35 EVL RHLG+ IDTVLVNIEKVP+ YM +N FDEYLVQVEHDF GL K RVISSNFL+ 

Sbjct: 241 EVLHRHLGRPFIDTVLVNIEKVPQEYMNSNRFDEYLVQVEHDFVGLCKQVSRVISSNFLR 300 

Query: 300 LEKGGAFHHGDFWEELMNLV 320 
LE GGAFH GD +V+ELM ++ 
40 Sbjct: 3 01 LENGGAFHDGDLI VDELMRI I 321 

A related DNA sequence was identified in S.pyogenes <SEQ ID 553 5> which encodes the amino acid 
sequence <SEQ ID 5536>. Analysis of this protein sequence reveals the following: 

Possible site: 36 
45 >>> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 251/320 (78%) , Positives = 284/320 (88%) 

55 Query: 1 MRKPKIWIGGGTGIPVILKSLRLEDVEITAvVTVADDGGSSGELRSVMQLTPPGDLRNV 60 

M+ PK+TVIGGGTGI +ILKSLR E V+ITAWTVADDGGSSGELR+ MQL PPGDLRNV 
Sbjct: 1 MKNPKMWIGGGTGISIILKSLRNEAVDITAVVTVADDGGSSGELRNAMQIAPPGDIiRNV 60 



Query: 61 LVALSDMPKFYEQIFQYRFAEGDGDFAGHPLGNIjIIAGVAEMQGSTYNAMQSLTQFFHTT 120 
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L+A+SDMPKFYE++FQYRF E DG AGHPLGNLI IAG++EMQGSTYNA+Q LT+FFH T 
Sbjct: 61 LIAMSDMPKFYERVFQYRFNESDGALAGHPLGNLIIAGISEMQGSTYNAIQILTKFFHIT 120 

Query: 121 GKIYPSSEHPLTLHAVFKDGHEWGESQIADYKGMIDHVYVTl^YNEETPTASRKWDAI 180 
5 GKIYPSSE LTLHAVFKDGHEV GES IA Y GMIDHVYVTNTYN++ P ASRKW+AI 

Sbjct: 121 GKIYPSSEQALTLHAVFKDGHEVAGESSIAKYPGMIDHVYVIOTYNDQKPQASRKVVEAI 180 

Query: 181 LESDMIVLGPGSLFTSILPNLVIPEIKQALLETRREVAYVCNIMTQRGETEHFTDADHVE 240 
LESDMIVLGPGSLFTSILPNLVIPEIK+AL +T+AEV Y+CNIMTQ GETE F+DADHV 
10 Sbjct: 181 LESDMIVLGPGSLFTSILPNLVIPEIKEALRQTKAEWYICNIMTQYGETEQFSDADHVA 240 

Query: 241 VLKRHLGQDAIDTVLWIEKVPESYMENNHFDEYLVQVEHDFSGLRKHARRVISSNFLKL 300 

VL +HLG+D IDTVLVN+ KVP++YM +N FDEYLVQV+HDF+GL + A+RVISS FL+L 
Sbjct: 241 VLNQHLGRDLIDTVLVNVAKVPQAYMNSNKFDEYLVQVDHDFAGLCRAAKRVISSYFLRL 300 

15 

Query: 301 EKGGAFHHGDFWEELMNIiV 320 

E GGAFH G+ WEELMNLV 
Sbjct: 301 ENGGAFHDGNLWEELMNLV 320 

20 SEQ ID 5534 (GBS269) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 49 (lane 12; MW 35kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 54 (lane 5; MW 60.5kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

25 Example 1781 

A DNA sequence (GBSxl888) was identified in S.agalactiae <SEQ ID 5537> which encodes the amino 
acid sequence <SEQ ID 5538>. Analysis of this protein sequence reveals the following: 



30 



35 



Possible site: 34 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2479 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB96619 GB:AJ400630 hypothetical protein [Streptococcus pneumoniae bacteriophage MM1] 
Identities = 209/303 (68%) , Positives = 260/303 (84%) 

40 Query: 1 MSFTVKVKEELLGHKSENKMELSAIIKMSGSLGIiANHGLNLSITTENAICIARHIYSMLEE 60 

MSFTV VKEE+LG ++ ELSAI I KMSGS+GL+ GL hS+ TENAK+ARH+Y 
Sbjct: 1 MSFTOAVKEEILGQHHLSRHELSAIIKMSGSIGLSTSGLTLSVVTENAKLARHLYESFLH 60 

Query: 61 HYHLQPEIOHQKTNLRKNRWTVFIEEKVDVILM)LK1jADAFFGIETGIEHSILDNDEN 120 
45 Y ++ EI++HQ++NLRKNRVYTVF +EKV +L+DL LAD+FFG+ETGI+ +IL ++E 

Sbjct: 61 FYE I KSE IRHHQRSNLRKNRVYTVFTDEKVQDLLSDLHIiADS FFGLETGIDEAI LSDEEA 120 

Query: 121 GRAYLRGAFLSTGTWEPDSGKYQLEIFSVYLDHAQDLANLMKKFMLDAKVIEHKHGAVT 180 
GRAYL GAFL+ G++R+P+SGKYQLEI SVYLDHAQ +A+L+++F+LDAKV+E K GAVT 
50 Sbjct: 121 GRAYLCGAFIANGSIRDPESGKYQLEISSvYLDHAQGIASLLQQFLLDAKVLERKKGAVT 180 

Query: 181 YLQKAEDIMDFLIVIDAMFARDAFEEIKMIRETRNDINRANNVETANIARTITASMKTIN 240 

YLQ+AEDIMDFLIVI AM+ARD FE +K++RETRND+NRANN ETANIART++ASMKTIN 
Sbjct: 181 YLQRAEDII^FLIVIGAMQARDDFERvTCILRETRNDLNRANNAETANIARTVSASMKTIN 240 



55 



Query: 241 NIIKIMDTIGFDALPSDLRQVAQVRVAHPDYSIQQIADSLETPLSKSGVNHRLRKINKIA 300 

NI KI D +G + LP DL++VAQ+R+ HPDYSIQQ+ADSL TPL+KSGVNHRLRKINKIA 
Sbjct: 241 NISKIKDIMGLENLPVDLQEVAQLRIQHPDYSIQQLADSLSTPLTKSGVNHRLRKINKIA 300 
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Query: 301 DEL 303 
DEL 

Sbjct: 301 DEL 303 

5 A related DNA sequence was identified in S.pyogenes <SEQ ID 5539> which encodes the amino acid 
sequence <SEQ ID 5540>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

>>> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0 . 1698 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) , < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



15 An alignment of the GAS and GBS proteins is shown below. 

Identities = 222/303 (73%) , Positives = 269/303 (88%) 





Query: 


1 


MSFTVKVKEELLGHKSENKMELSAIIKMSGSLGIANHGLNLSITTENAKIARHIYSMLEE 


60 








MSFT KVKEEL+ + + EL+AIIK+SGSLGLA+ L+LSITTENAKIAR-HYS++E+ 




20 


Sb j ct : 


1 


MSFTTKVKEELIHLSTGDNNEIAAIIKLSGSLGLAHQSLHLSITTENAKIARYIYSLIED 


60 




Query: 


61 


HYHLQPEIKYHQKTNLRKNRVYTVFIEEKVDVIIADLKLADAFFGIETGIEHSILDNDEN 


120 








Y + PE I + YHQKTNLRKNR VYTV+ +E+ V+ ILADLKLAD+FFG+ETGIE +L +D 




25 


Sb j ct : 


61 


AYVI VPE I RYHQKTNLRKNR VYTVYVEQG VETI LADLKLADS FFGLETGI EPQVLSDDNA 


120 




Query: 


121 


GRAYLRGAFLSTGTVREPDSGKYQLEIFSWLDHAQDIANLMKKFMLDAKVIEHKHGAVT 


180 








GR+YL+GAFL+ G++R+P+SGKYQLEI+SVYLDHAQDLA LM+KFMLDAK IEHK GAVT 






Sbjct: 


121 


GRSYLKGAFIJ^GSIRDPESGKYQLEIYSWIjpHAQDLAQLMQKFMLDAKTIEHKSGAVT 


180 


30 


Query: 


181 


YLQKAEDIMDFLIVIDAMEARDAFEEIKMIRETRNDINRANNVETANIARTITASMKTIN 


240 








YLQKAEDIMDFLI+I AM ++ FE IK++RE RNDINRANN ETANIA+TI+ASMKTIN 






Sbj ct : 


181 


YLQKAEDIMDFLIIIGAMSCKEDFEAIKLLREARNDINRANNAETANIAKTISASMKTIN 


240 




Query: 


241 


NIIKIMDTIGFDALPSDLRQVAQVRVAHPDYSIQQIADSLETPLSKSGVNHRLRKINKIA 


300 


35 






NIIKIMDTIG ++LP +L+QVAQ+RV HPDYSIQQ+AD+LE P++KSGVNHRLRKINKIA 






Sbjct: 


241 


NIIKIMDTIGLESLPIELQQVAQLRVKHPDYSIQQVADALEFPITKSGVNHRLRKINKIA 


300 




Query: 


301 


DEL 303 










D+L 




40 


Sbjct: 


301 


DDL 303 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. ■ • 

Example 1782 

45 A DNA sequence (GBSxl889) was identified in S.agalactiae <SEQ ID 5541> which encodes the amino 
acid sequence <SEQ ID 5542>. This protein is predicted to be dipeptidase. Analysis of this protein 
sequence reveals the following: 

Possible site: 23 

»> Seems to have no N-terminal signal sequence 

50 

Final Results 

bacterial cytoplasm Certainty=0. 3544 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=o . 0000 (Not Clear) < suco 

55 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA86210 GB:Z38063 dipeptidase [Lactobacillus helveticus] 
Identities = 218/473 (46%) , Positives = 310/473 (65%) , Gaps = 14/473 (2%) 
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Query : 


3 


CTTILVGKKASYDGSTMIARTEDSVNGDFTPKKLKVMTSKDQPRHYKSVLSNFEVD L 


59 






CTTILVGKKAS DGSTMIAR+ED P+ KV+ +DQP+HY SV+S ++D L 




.Sbjct: 


6 


CTTILVGKKASIDGSTMIARSEDG-GRVIIPEGFKVVNPEDQPKHYTSVISKQKIDDEDL 


64 


Query : 


60 


PDNPLPYTSVPDALGKDGIWGEAGINSKNVAMSATETITTNSRVLGADPLVSD- - -GIGE 


116 






+ PL YTS PD GK+GIWG AGIN+ NVAM+ATETITTNSR+ G DP++ G+GE 




Sbjct: 


65 


AETPLRYTSAPDVSGKNGIWGAAGINADNVAMTATETITTNSRIQGVDPILDPSEGGLGE 


124 


Query: 


117 


EDILTLA7LPYIQSAREGVERLGAILEKYGTYESNGIAFSDTEEIWWLETIGGHHWIARRV 


176 






ED +TL LPY+ SA +GV+R+G ++EKYGTYE NG+AFSD + IW+LETIGGHHWIARR+ 




Sb j ct : 


125 


EDFVTLTLPYLHSAFDGVKRVGYLVEKYGTYEMNGMAFSDKDNIWYLETIGGHHWIARRI 


184 


Query: 


177 


PDDVYVTOPNQLGIDHFEFNNCDDYMCSSDLKEFIEQYHLDLTYSNEHFNPRYAFGSQRD 


236 






PDD YV PN+L ID F+F++ +++ +SDLK+ I++YHL+ E +N R+ FGS 




Sb j ct : 


185 


PDDAYVIAPNRLNIDTFDFDDSENFAAASDLKDLIDEYHLN- - PDREGYNMRHIFGSSTI 


242 


Query: 


237 


KDRHYIWPRSWAMQRFLNPEIEQDPRSLFIPWCQKPYRKITVEDIKYVLSDHYQDSVYDP 


296 






KD HYN PR+W + + +P+ P P+ + R I++EDIK+ S HYQD+ YD 




Sb j ct : 


243 


KDAHYNNPRAWYIHNYFDPDFGGTPADQDQPFICRANRLISIEDIKWAESSHYQDTPYDA 


302 


Ouerv: 


297 


YGPEGDAVSRRAFRSVGINRTSQTSILQLRPNKSLETTGVQWLSYGSMPFATMVPLFTQV 


356 






YG +G ++ FR +GINR +T ILQ+R + E GVQWL++G F +M+P +T V 




Sbjct: 


303 


YGDQGTPEQKKTFRPIGINRNFETHILQIR1TOVPAEIAGVQWLAFGPNTFNSMLPFYTNV 


362 


Query: 


357 


ETVPNYFSOTTKDASTDNFYWTNRLIARLADPHFYQHEADIESYIERTMAQGHAHINGVD 


416 






TP + TK + + +W N+L A L D ++ + +++ ++++AQ H + D 




Sb j ct : 


363 


TTTPEAWQTTPK- FNLNKI FWLNKLTAQLGDTNYRVYGELEDAFEQKSLAQCHKIQHETD 


421 


Query: 


417 


REVAENKE I DFQQK NQEMSDYIQKESQELLNRILFDASNLMTNRFSMGD 4 65 








+EV + Q K NQ+MSD + + ELL +++ + LMT ++ + D 




Sb j ct : 


422 


KEVKI^SGKELQDKLIAANQKMSDTVYNlTrVELLGQMVDEGHGIiMTLKYDLLD 474 





A related DNA sequence was identified in S.pyogenes <SEQ ID 5543> which encodes the amino acid 
sequence <SEQ ID 5544>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

>>> Seems to have no N-terrainal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0514 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

, Identities = 345/464 (74%) , Positives = 407/464 (87%) 

Query: 2 ACTTILVGKKASYDGSTMIARTEDSWGDFTPKKLKVMTSKDQPRHYKSVLSNFEVDLPD 61 

+CTTILVGKK&SYDGSTM+ARTEDS NGDFTPKK+ V+ +DQPRHY+SV S+FE+DLPD 
Sbjct: 9 SCTTILVGKKASYDGSTMVARTEDSQNGDFTPKKMIWKPEDQPRHYRSVQSSFEMDLPD 68 

Query: 62 NPLPYTSVPDALGKDGIWGEAGINSKNVAMSATETITTNSRVLGADPLVSDGIGEEDILT 121 

NP+ YTSVPDALGKDGIW EAG+N NVAMSATETITTNSRVLGADPLV+ GIGEED++T 
Sbjct: 69 NPMTYTSVPDALGKDGIWAEAGVNEANVAMSATETITTNSRVLGADPLVASGIGEEDMVT 128 

Query: 122 LVLPYIQSAREGVERIGAILEKYGTYESNGIAFSDTEEIWWLETIGGHHWIARRVPDDVY 181 

LVLPYI +SAREGV RLGAILE YGTYESNG+AFSD +IWWLETIGGHHWIARRVPDD Y 
Sbjct: 129 LVLPYIRSAREGVLRLGAILEDYGTYESNGVAFSDEHDIWWLETIGGHHWIARRVPDDAY 188 

Query: 182 VTNPNQLGIDHFEFNNCDDYMCSSDLKEFIEQYHLDLTYSNEHFNPRYAFGSQRDKDRHY 241 

VTNPNQ GIDHFEFNN +DY+CS+DLK+FI+ YHLDLTYS+EHFNPRYAFGSQRDKDR Y 
Sbjct: 189 VTNPNQFGIDHFEFNNPEDYLCSADLKDFIDTYHLDLTYSHEHFNPRYAFGSQRDKDRQY 248 



Query: 242 NTPRSWAMQRFLNPEIEQDPRSLFIPWCQKPYRKITVEDIKYVLSDHYQDSVYDPYGPEG 301 

NTPR+W MQ+FLNPEI QDPRS + WCQKPYRKITVED+KYVLS HYQD+ YDPYG EG 
Sbjct: 249 NTPRAWIMQKFLNPEIVQDPRSFAIAWCQKPYRKITVEDVKYVLSSHYQDTGYDPYGSEG 308 
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Query: 302 DAVSRRAFRSVGINRTSQTSILQLRPNKSIjETTGVQWIjSYGSMPFATMVPLFTQVETVPN 361 

VS++ FR +GINRTSQT+IL +RPNK E +QW++YGSMPF TMVP FTQV+T+P+ 
Sbjct: 309 TPVSKK^FRPIGINRTSQTAILHIRPNKPQEIAAIQWMAYGSMPFNT^TVPFFTQVKTIPD 368 

Query: 362 YFSNTTKDASTDNFYWTNRLIAAlADPHFyQHEADIESYIERTMAQGHAHINGVDREVAE 421 

YF+NT ++ TDNFYWTNRLIAALADPH+ HE D+++Y+E TMA+GHA ++ V+ ++ 
Sbjct: 369 YFAOTYEWFTDNFYWINRLIAALADPHYNHHETDLDNYLEETMAKGHAMLHAVEVOLLA 428 

Query: 422 NKEIDFQQKNQEMSDYIQKESQELLNRILFDASNLMTNRFSMGD 465 

+ +D +++NQ+MSDY+Q E+Q LM+ILFDASNLMTNRFS+ D 
Sbjct: 429 GETVDLEEENQKMSDYVQGETQTLLNKILFDASNLMTNRFSLSD 472 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1783 

A DNA sequence (GBSxl890) was identified in S.agalactiae <SEQ ID 5545> which encodes the amino 
acid sequence <SEQ ID 5546>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

>>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA96185 GB:Z71552 AdcA protein [Streptococcus pneumoniae] 
Identities = 257/429 (59%) , Positives = 312/429 (71%) , Gaps = 7/429 (1%) 



Query: 


1 


MRKKFLLLMSFVAMFAAWQLVQVKQWADSKLKVVTTFYPVYEFTKNVVGDKADVSMLIK 


60 






M+K LLL S A+F + Q AD KL +VTTFYPVYEFTK V GD A+V +LI 




Sb j ct : 


1 


MKKISLLIASLCALFL- - -VACSNQKQADGKLjNIVTTFYPVYEFTKQVAGDTANVELLIG 


57 


Query: 


61 


AGTEPHDFEPSTKNIAAIQDSNAFVYMDDNMETWAPKVAKSVKSKKVTTIKGTGDMLLTK 


120 






AGTEPH++EPS K +A 1QD++ FVY ++NMETW PK+ ++ KKV TIK TGDMLL 




Sb j ct : 


58 


AGTEPHEYEPSAKAVAKIQDADTFVYENENMETWVPKLLDTLDKKKVKTIKATGDMLLLP 


117 


Query: 


121 


GVEEEGEEHEGHGHEGHHHELDPHVWLSPERAISWENIRNKFVKAYPKDAASFNKNADA 


180 






G EEE +H+ HG EGHHHE EPHVWLSP RAI +VE+IR+ YP +F KNA A 




Sb j ct : 


118 


GGEEEEGDHD-HGEEGHHHEFDPHVWLSPVRAIKLVEHIRDTLSADYPDKKETFEKNAAA 


176 


Query: 


181 


YIAKLKELDKEYKNGLSNAKQKSFVTQHAAFGYMALDYGIiNQVPIAGLTPDAEPSSKRLG 


240 






YI KL+ LDK Y GLS AK+KSFVTQRAAF Y+ALDYGL QV I+GL+PDAEPS+ RL 




Sb j ct : 


177 


YIEKLQSLDKAYAEGLSQAKEKSFVTQHAAFNYLALDYGLKQVAISGLSPDAEPSAARLA 


236 


Query: 


241 


ELAKYIKKYNINYIYFEENASNKVAKTLADEVGVKTAVLSPLEGLSKKEMAAGEDYFSVM 


300 






EL +Y+KK I YIYFEENAS +A TL+ E GVKT VL+PLE L++++ AGE+Y SVM 




Sb j ct : 


237 


ELTEYVKKNKIAYIYFEENASQAIANTLSKEAGVKTDVIjNPLESLTEEDTKAGENYISVM 


296 


Query: 


301 


RRNLKVLKKTTDVAGKEVAPEE-DKTKTVETGYFKTKDVKDRKLTDYSGNWQSVYPLLQD 


359 






+NLK LK+TTD G + PE+ + TKTV+ GYF+ VKDR L+DY+GNWQSVYP L+D 




Sb j ct : 


297 


EKNLKALKQTTDQEGPAIEPEKAEDTKTVQNGYFEDAAVKDRTLSDYAGNWQSVYPFLED 


356 


Query: 


360 


GTLDPVTOYKAKSKTOMTAAEYKKYYTAGYKTDVESIKIDGKKHQMTFVRNGKSQTFTYK 


419 






GT D V+DYKAK MT AEYK YYT GY+TDV II + M FV+ G+S+ +TYK 




Sb j ct : 


357 


GTFDQVFDYKAKLTGKMTQAEYKAYYTKGYQTDVTKINI - -TDNTMEFVQGGQSKKYTYK 


414 


Query: 


420 


YAGYKILTY 428 








Y G KILTY 




Sb j ct : 


415 


YVGKKILTY 423 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 5547> which encodes the amino acid 
sequence <SEQ ID 5548>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

»> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3 000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAA96185 GB:Z71552 AdcA protein [Streptococcus pneumoniae] 
Identities = 259/438 (59%) , Positives = 326/438 (74%) , Gaps = 16/438 (3%) 



Query: 


1 


MKKKILLWSLISVFFAWQLTQAKQVrAEGKVKVVTTFYPVYEFTKGVIGNDGDVFMLMK 


60 






MKK LL+ SL ++F + + Q A+GK+ +VTTFYPVYEFTK V G+ +V +L+ 




Sbjct: 


1 


MKKISLLLASLCALFL VACSNQKQADGKLNIVTTFYPvYEFTKQVAGDTANVELLIG 


57 


Query: 


61 


AGTEPHDFEPSTKDIKKIQDADAFvYMDDNMETWVSDVKKSLTSKKOTIVKGTGNMLLVA 


120 






AGTEPH++EPS K + KIQDAD FVY ++NMETWV + +L KKV +K TG+MLL+ 




Sb j ct : 


58 


AGTEPHEYEPSAKAVAKIQDADTFVYENENMETWVPKLLDTLDKKKVKTIKATGDMLLLP 


117 


Query : 


121 


GAGHDHPHEDADKKHEHNIOISEEGHNHAFDPHvWLSPYRSITvVENIRDSLSKAYPEKAE 


180 






G E+ + H+H EEGH+H FDPHVWLSP R+I +VE+IRD+LS YP+K E 




Sb j Ct : 


118 


GG EEEEGDHDHG- - - EEGHHHEFDPHVWLSP VRAI KLVEH IRDTLSADYPDKKE 


168 


Query: 


181 


NFKANAATYIEKLKELDKDYTAALSDAKQKSFVTQHAAFGYMALDYGLNQISINGVTPDA 


240 






F+ NAA YIEKL+ LDK Y LS AK+KSFVTQHAAF Y+ALDYGL Q++I+G++PDA 




Sbjct: 


169 


TFEKNAAAYIEKLQSLDKAYREGLSQAKEKSFVTQHAAFNYLALDYGLKQVAISGLSPDA 


228 



Query: 241 EPSAKRIATLSKYVKKYGIKYIYFEENASSKVAKTLAKEAGVKAAVLSPLEGLTEKEMKA 
EPSA R+A L++YVKK I YIYFEENAS +A TL+KEAGVK VL+PLE LTE++ KA 
Sbjct: 229 EPSAARLAELTEYVKKNKIAYIYFEENASC^LANTLSKEAGVKTDVLNPLESLTEEDTKA 

Query: 301 GQDYFTVMRKNLETLRLTTDVAGKEIBPEK-DTTKTVYNGYFKDKEVKDRQLSDWSGSWQ 359 

G++Y +VM KNL+ L+ TTD G I PEK + TKTV NGYF+D VKDR LSD++G+WQ 
Sbjct: 289 GENYISVMEKNIjKALKQTTDQEGPAIEPEKAEDTKTVQNGYFEDARVKDRTLSDYAGNWQ 348 

Query: 360 SVYPYLQDGTLDQVWDYKAKKSKGKMTAAEYKDYYTTGYKTDVEQIKINGKKKTMTFVRN 419 

SVYP+L+DGT DQV+DYKAK + GKMT AEYK YYT GY+TDV KIN , TM FV+ 
Sbjct: 349 SVYPFLEDGTFDQVFDYKAKLT-GKMTQAEYKAYYTKGYQTDV--TKINITDNTMEFVQG 405 

Query: 420 GEKKTFTYTYAGKE I LTY 437 

G+ K +TY Y GK+ILTY 
Sbjct: 406 GQSKKYTYKYVGKKILTY 423 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 353/515 (68%) , Positives = 422/515 (81%) , Gaps = 9/515 (1%) 



Query: 


1 


MRKKFLLLMSWAMFAAWQLVQVKQWADSICLKVWTFYPWEFTKNWGDKADVSMIjIK 


60 






M+KK LL+MS +++F AWQL Q KQV A+ K+KWTTFYPVYEFTK V+G+ DV ML+K 




Sb j ct : 


1 


MKKKILLMMSLISVFFAWQLTQAKQVIAEGKVKWrTFYPVYEFTKGVIGNDGDVFMLMK 


60 


Query: 


61 


AGTEPHDFEPSTKNIAAIQDSNAFVYMDD1METWAPKVAKSVKSKKVTTIKGTGDMLLTK 


120 






AGTEPHDFEPSTK+ 1 IQD++AFVYMDDNMETW V KS+ SKKVT +KGTG+MLL 




Sb j ct : 


61 


AGTEPHDFEPSTKDIKKIQDADAFVYI^DNIffiTWVSDVKKSLTSKKOTIVKGTGNMLLVA 


120 


Query: 


121 


GV EEEGEEHEGHGHEGHHHELDPHVVJLSPERAISVVFjNIRNKFVKAYPKDAA 


172 






G ++ EH H EGH+H DPHVWLSP R+I+WENIR+ KAYP+ A 




Sb j ct : 


121 


GAGHDHPHEDADKKHEHNKHSEEGHNHAFDPHVWLSPYRSITVVENIRDSLSKAYPEKAE 


180 


Query: 


173 


SFNKNADAYIAKLKELDKEYKNGLSNAKQKSFVTQHARFGYMALDYGLNQVPIAGLTPDA 


232 



+F NA YI KLKELDK+Y LS+AKQKSFVTQHAAFGYMALDYGLNQ+ I G+TPDA 



300 
288 
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Sbjct: 181 NFKANAATYIEKLKELDKDYTAALSDAKQKSFVTQHftAFGYMALDYGIiNQI S INGVTPDA 240 

Query: 233 EPSSKRLGE^KYIKICYNINYIYFEENASNKVAKTLADEVGVKTAVLSPLEGLSKKEMAA 292 

EPS+KR+ L+KY+KKY I YIYFEENAS+KVAKTLA E GVK AVLSPLEGL++KEM A 
Sbjct: 241 EPSAKRIATLSKYVKKYGIKYIYFEENASSKWAKTLAKEAGVKAAVLSPLEGLTEKEMKA 300 

Query: 293 GEDYFSVmRNLK^KKTTDVAGKEVAPEEDKTKTVETGYFKTKDVKDRKLTDYSGNWQS 352 

G+DYF+VMR+NL+ L+ TTDVAGKE+ PE+D TKTV GYFK K+VKDR+L+D+SG+WQS 
Sbjct: 301 GQDYFTVMRKNLETLRLTTDVAGKEILPEKDTTKTVYNGYFKDKEVKDRQLSDWSGSWQS 360 

Query: 353 WPLLQDGTLDPVVJDYKA-KSKKDMTAAEYKKYYTAGYKTDVESIKIDGKKHQMTFVRNG 411 

VYP LQDGTLD VWDYKA KSK MTAAEYK YYT GYKTDVE IKI+GKK MTFVRNG 
Sbjct: 361 WPYLQDGTLDQVWDYKAKKSKGKMTAAEYKDYYTTGYKTDVEQIKINGKKKTMTFVRNG 420 

15 Query: 412 KSQTFTYKYAGYKILTYKKGNRGVRYLFEAKEKDAGQFKYIQFSDHGIKPNKAEHFHIFW 471 

+ +TFTY YAG +ILTY KGNRGVR+ + FEAKE DAG+FKY+QFSDH I P KA+HFH++W 
Sbjct: 421 EKKTFTYTYAGKEILTYPKGNRGVRFMFEAKEADAGEFKYVQFSDHAIAPEKAKHFHLYW 480 

Query: 472 GSESQEKLFEEMENWPTYFPAKMSGREVAQDLMSH 506 
20 G +SQEKL +E+E+WPTY+ + +SGRE+AQ++ +H 

Sbjct: 481 GGDSQEKLHKELEHWPTYYGSDLSGREIAQEINAH 515 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

25 A related GBS gene <SEQ ID 8899> and protein <SEQ ID 8900> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 3 
SRCFLG: 0 

McG: Length of DE: 19 
30 Peak Value of UR: 2.79 

Net Charge of CR: 3 
McG: Discrim Score: 9.08 
GvH: Signal Score (-7.5): 2.59 
Possible site: 15 
35 >>> Seems to have a cleavable N-term signal seq. 

Amino Acid Composition: calculated from 16 
ALOM program count: 0 value: 7.69 threshold: 0.0 
PERIPHERAL Likelihood = 7.69 264 
modified ALOM score: -2.04 

40 

*** Reasoning Step: 3 

Rule gpol 

45 Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

50 The protein has homology with the following sequences in the databases: 

3758895|emb|CAA96185.l| |Z71552 AdcA protein {Streptococcus pneumoniae} >PIR|T46756 |T46756 
Zn-binding lipoprotein 

adcA [imported] - Streptococcus pneumoniae (fragment) 

55 Score = 508 bits (1294) , Expect = e-143 

Identities = 257/429 (59%) , Positives = 312/429 (71%) , Gaps = 7/429 (1%) 

Query: 1 MRKKFLLLMSFVAMFAAWQLVQVKQWADSKLKVVTTFYPVYEFTKNWGDKADVSMLIK 60 
M+K LLL S A+F + Q AD KL +VTTFYPVYEFTK V GD A+V +LI 
60 Sbjct: 1 MKKISLLLASLCALFL VACSNQKQADGKLNIVTTFYPVYEFTKQVAGDTANVELLIG 57 



Query: 61 AGTEPHDFEPSTKNIAAIQDSNAFVYMDDNMETWAPKVAKSVKSKKVTTIKGTGDMLLTK 120 
AGTEPH++EPS K +A IQD++ FVY ++NMETW PK+ ++ KKV TIK TGDMLL 
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Sb j ct : 



58 



AGTEPHEYEPSAKAVAKIQDJfflTFVYENENMETWVPKLLDTLDKKKVKTIKATGDMLLLP 



117 



Query: 



121 



GVEEEGEEHEGHGHEGHHHELDPHVWLSPERAISVVENIRHKFVKAYPKDAASFNKNADA 
G EEE +H+ HG EGHHHE DPHVWLSP RAI +VE+IR+ YP +F KNA A 



180 



Sb j ct : 


118 


GGEEEEGDHD-HGEEGHHHEFDPHVWLSPVRAIKLVEHIRDTLSADYPDKKETFEKNAAA 


176 


Query: 


181 


YIAKLKELDKEYKNGLSNAKQKSFVTQHAAFGYMALDYGLNQVPIAGLTPDAEPSSKRLG 


240 






YI KL+ LDK Y GLS AK+KSFVTQHAAF Y+ALDYGL QV I+GL+PDAEPS+ RL 




Sbjct: 


177 


YIEKLQSLDKAYAEGLSQAKEKSFVTQHAAFNYLALDYGLKQVAISGLSPDAEPSAARLA 


236 


Query: 


241 


ELAKYIKJCYNIOTIYFEENASNKVAXTLADEVGVKTAVLSPLEGLSKKEMAAGEDYFSVM 


300 






EL +Y+KK I YIYFEENAS +A TL+ E GVKT VL+PLE L++++ AGE+Y SVM 




Sb j ct : 


237 


ELTEWKKNKIAYIYFEENASQAIiANTLSKEAGVKTDVLNPLESLTEEDTKAGENYISVM 


296 


Query: 


301 


RRNLKVLKKTTDVAGKEVAPEE - DKTKTVETGYFKTKDVKDRKLTDYSGNWQSVYPLLQD 


359 






+NLK LK+TTD G + PE+ + TKTV+ GYF+ VKDR L+DY+GNWQSVYP L+D 




Sbjct: 


297 


EKNLKALKQTTDQEGPAIEPEKAEDTKTVQNGYFEDAAVKDRTLSDYAGKWQSVYPFLED 


356 


Query: 


360 


GTLDPVWDYKAKSKKDMTAAEYKKYYTAGYKTDVESIKIDGKXHQMTFVRNGKSQTFTYK 


419 






GT D V+DYKAK MT AEYK YYT GY+TDV II + M FV+ G+S+ +TYK 




Sbjct: 


357 


GTFDQVFDYKAKLTGKMTQAEYKAYYTKGYQTDVTKINI - -TDNTMEFVQGGQSKKYTYK 


414 


Query: 


420 


YAGYKILTY 428 








Y G KILTY 




Sb j ct : 


415 


YVGKKILTY 423 





SEQ ID 8900 (GBS325) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 65 (lane 3; MW 58kDa). 

The GBS325-His fusion product was purified (Figure 210, lane 7) and used to immunise mice. The 
resulting antiserum was used for Western blot (Figure 257A) and FACS (Figure 257B). These tests confirm 
that the protein is immunoaccessible on GBS bacteria. 

Example 1784 

A DNA sequence (GBSxl891) was identified in S.agalactiae <SEQ ID 5549> which encodes the amino 
acid sequence <SEQ ID 5550>. This protein is predicted to be ribosomal protein L31 (rl31). Analysis of this 
protein sequence reveals the following: 

Possible site: 61 

>>> Seems to have no N- terminal signal sequence 



A related GBS nucleic acid sequence <SEQ ID 9637> which encodes amino acid sequence <SEQ ID 9638> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF80389 GB:AF160251 ribosomal protein L31 [Listeria innocua] 
Identities = 61/81 (75%) , Positives = 71/81 (87%) , Gaps = 1/81 (1%) 

Query: 9 MKKDIHPDYRPVVFLDTTTGYKFLSGSTKSTKETVEFE-GETYPLIRVEISSDSHPFYTG 67 

MK IHP+YRPWF+DT+T +KFLSGSTKS+ ET+++E G YPL+RVEISSDSHPFYTG 
Sbjct: 1 MKTGIHPEYRPWFVDTSTDFKFLSGSTKSSSETIKWEDGNEYPLLRVEISSDSHPFYTG 60 

Query: 68 RQKFTQADGRVDRFNKKYGLK 88 

+QK ADGRVDRFNKKYGLK 
Sbjct: 61 KQKHATADGRVDRFNKKYGLK 81 



Final Results 



bacterial cytoplasm Certainty=0 . 1948 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 555 1> which encodes the amino acid 
sequence <SEQ ID 5552>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1910 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 81/86 (94%) , Positives = 86/86 (99%) 

Query: 9 MKKDIHPDYRPWFLDTTTGYKFLSGSTKSTKETVEFEGETYPLIRVEISSDSHPFYTGR 68 

M+KDIHPDYRPWFLDTTTGY+FLSGSTK++KETVEFEGETYPLIRVEISSDSHPFYTGR 
Sbjct: 1 MRKDIHPDYRPWFLDTTTGYQFLSGSTKASKETVEFEGETYPLIRVEISSDSHPFYTGR 60 

Query: 69 QKFTQADGRVDRFNKKYGLKDANAAQ 94 

QKFTQADGRVDRFNKKYGLKDANAA+ 
Sbjct: 61 QKFTQADGRTORFNKKYGLKDANAAK 86 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1785 

A DNA sequence (GBSxl892) was identified in S.agalactiae <SEQ ID 5553> which encodes the amino 
acid sequence <SEQ ID 5554>. This protein is predicted to be aspartate aminotransferase (aspC). Analysis 
of this protein sequence reveals the following: 
Possible site: 61 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1740 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9421> which encodes amino acid sequence <SEQ ID 9422> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC21948 GB:U32714 aminotransferase [Haemophilus influenzae Rd] 
Identities = 200/323 (61%) , Positives = 264/323 (80%) , Gaps = 1/323 (0%) 



Query: 


1 


MQYYQLQNI-HTOMDDIYI vNGVSEGISMSMQALLDNDDEVLVPMPDYPLWTACVSLAGG 


59 






+QYYQ + I ++D+YI NGVSE I+M+MQALL++ DEVLVPMPDYPLVJTA V+L+GG 




Sb j ct : 


82 


VQYYQSKGILGATVNDVYIGNGVSELITMAMQALLNDGDEVLVPMPDYPLWTAAVTLSGG 


141 


Query: 


60 


NAWYICDEEANWYPDIDDIKSKITSKTKAIVLINPNNPTGAVYPREILQEIVDIARQND 


119 






AVHY+CDE+ANW+P IDDIK+K+ +KTKAIV+ INPNNPTGAVY +E+LQEIV+IARQN+ 




Sb j ct : 


142 


KAVOTLCDEDANWFPTIDDIKAKVNAKTKAIVIINPNNPTGAVYSKELLQEIVEIARQNN 


201 


Query: 


120 


LI I FSDEVYDRL vMDGMEHI PI AS I AEDIFTVTLSGLSKSHRI CGFRVGWMVLSGPRQHV 


179 




LIIF+DE+YD+++ DG H IA++A D+ TVTL+GLSK++R+ GFR GWM+L+GP+ + 




Sb j Ct : 


202 


LIIFADEIYDKILYDGAVBHHIAALAPDLLTOTLN^ 


261 


Query: 


180 


KGYIEGI^IANMRLCSNVIAQQVIQTSLGGQQSIDSMLLPGGRIYEQRNYIHKAINEIP 


239 






KGYIEGL+MLA+MRLC+NV Q IQT+LGG QSI+ +LPGGR+ EQRN + I +IP 




Sb j ct : 


262 


KGYIEGLDMLASMRLCANVPMQHAIQTALGGYQSINEFILPGGRLLEQRNKAYDLITQIP 


321 
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Query: 240 GLSAVKPNAGLYLFPKIDTD^mlIDOT3EEFVIOTLKQEKTOLTHGRGBWNTADHFRIVY 299 

G++ VKP +Y+FPKID + I +DE+ VL+ L+QEKVLL HG+GFN ++ DHFRIV 
Sbjct: 322 GITCVKPMGAMYMFPKIDVKKFNIHSDEKMVLDLLRQEKVLLVHGKGFNWHSPDHFRIVT 381 

Query: 300 LPRVDELTELQEKMARFLSQYKR 322 

LP V++L E K+ARFLS Y++ 
Sbjct: 382 LPYVNQLEEAITKLARFLSDYRQ 404 

There is also homology to SEQ ID 3662. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1786 

A DNA sequence (GBSxl893) was identified in S.agalactiae <SEQ ID 5555> which encodes the amino 
acid sequence <SEQ ID 5556>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.02 Transmembrane 164 - 180 ( 163 - 181) 



Final Results 

bacterial membrane Certainty=0 . 1808 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10099> which encodes amino acid sequence <SEQ ID 
10100> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06181 GB:AP001515 transcriptional pleiotropic repressor 
[Bacillus halodurans] 
Identities = 129/257 (50%) , Positives = 181/257 (70%) , Gaps = 3/257 (1%) 

NLLEKTRKITSILQRSVDSLDAELPYNTMAAQLM)IIDCNACIINGGGNLLGYAMKYKTN 82 
+LL + RKI +LQ+S + + MA L D+I N +++ G LLG+A+K + 



+R+++ E +QFP+ Y +V +T ANL ++++ + FPVE KE F+ G+TTI PI G 



GG RLGT 1+ R + F+DDDLIL E +TWG+++L+ +T+ +EE R + V MAI++ 



LSYSE++AV I ELDG EG L AS IADR+GITRSVI VNALRKLESAG+ IESRSLGMK 



GTY+KV+N+ +L++ 
GTYI KVLNDKFL VELEK 255 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5557> which encodes the amino acid 
sequence <SEQ ID 5558>. Analysis of this protein sequence reveals the following: 

Possible site: 38 
>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.64 Transmembrane 144 - 160 ( 143 - 161) 



Query: 


23 


Sb j ct : 


2 


Query: 


83 


Sb j ct : 


59 


Query: 


143 


Sb j ct : 


119 


Query: 


203 


Sb j ct : 


179 


Query: 


263 


Sb j ct : 


239 



Final Results 
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bacterial membrane 
bacterial outside 
bacterial cytoplasm 



Certainty=0 . 1256 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:CAB13490 GB:Z99112 transcriptional regulator [Bacillus subtilis] 
Identities = 131/255 (51%) , Positives = 179/255 (69%) , Gaps = 3/255 (1%) 



Query: 



4 



LLEKTRKITSILQRSVDSLETELPYNTMASRLADIIDCNACIINGGGTLLGYAMKYKTNT 
LL+KTR I S+LQ + + + MA L D+ID N +++ G LLGY++ + 



63 



Sbjct: 3 LLQKTRIINSMLQAAAGK PVNFKEMAETLRDVIDSN1FWSRRGKLLGYSINQQIEN 59 

Query: 64 DRVEEFFEAKQFPDTYVKAASRVYDTEANLSVENELTIFPVESKDTYPGGLTTIAPIYGG 123 

DR+++ E +QFP+ Y K V +T +NL + +E T FPVE++D + GLTTI PI GG 
Sbjct: 60 DRMKKMLEDRQFPEEYTKNLFNVPETSSNLDINSEYTAFPVENRDLFQAGLTTIVPIIGG 119 

Query: 124 GMRLGSLIIWRNDNEFSDDDLILVEISSTWGIQLLNLQTENLEDTIRKQTAVNMAINTL 183 

G RLG+LI+ R ++F+DDDLIL E +TWG+++L + E +E+ R + V MAI++L 
Sbjct: 120 GERLGTLILSRLQDQFNDDDLILAEYGATWGMEILREKAEEIEEEARSKAWQMAISSL 179 

Query: 184 SYSEMKAVAAILGELDGNEGRLTASVIADRIGITRSVIVNALRKLESAGIIESRSLGMKG 243 

SYSE++A+ I ELDGNEG L AS IADR+GITRSVI VNALRKLESAG+ IESRSLGMKG 
Sbjct: 180 SYSELEAIEHIFEELDGNEGLLVASKIADRVGITRSVIVNALRKLESAGVIESRSLGMKG 239 

Query: 244 TYLKVINEGI FAKLK 258 

TY+KV+N +L+ 
Sbjct: 240 TYIKVLNNKFLIELE 254 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 232/260 (89%) , Positives = 247/260 (94%) 

Query: 21 MPNLLEKTRKITSILQRSVDSLDRELPYNTMAAQLADIIDCNACIINGGGNLLGYAMKYK 80 

MPNLLEKTRKITSILQRSVDSL+ ELPYNTMA++IADIIDCNACIINGGG LLGYAMKYK 
Sbjct: 1 MPNLLEKTRKITSILQRSVDSLETELPYNTMRSRLRDIIDCNACIINGGGTLLGYAMKYK 60 

Query: 81 TNTDRVEEFFETKQFPDYYVKSASRVYDTEANLSVDNDLSIFPVETKENFQDGITTIAPI 140 

TNTDRVEEFFE KQFPD YVK+ASRVYDTEANLSV+N+L+ 1 FPVE+K+ + G+TTIAPI 
Sbjct: 61 TNTDRVEEFFFAKQFPDTWKAASRWDTEANLSVENELTIFPVESKDTYPGGLTTIAPI 120 

Query: 141 YGGGMRLGTFIIWRNDKEFSDDDLILVEIASTWGIQLLNLQTENLEENIRKQTAVTMAI 200 

YGGGMRLG+ IIWRND EFSDDDLILVEI+STWGIQLLNLQTENLE+ IRKQTAV MAI 
Sbjct: 121 YGGGMRLGSLIIWRNDNEFSDDDLILVEISSTWGIQLLNLQTENLEDTIRKQTAVNMAI 180 

Query: 201 NTLSYSEMKAVAAILGELDGLEGRLTASVIADRIGITRSVIVNALRKLESAGIIESRSLG 260 

NTLSYSEMKAVAAI LGELDG EGRLTASVIADRIGITRSVIVNALRKLESAGIIESRSLG 
Sbjct: 181 NTLSYSEMKAVAA.ILGELDGNEGRLTASVIADRIGITRSVIVNALRKLESAGIIESRSLG 240 

Query: 261 MKGTYLKVINEGIFDKLKEY 280 

MKGTYLKVINEGIF KLKE+ 
Sbjct: 241 MKGTYLKVINEGIFAKLKEF 260 

A related GBS gene <SEQ ID 8901> and protein <SEQ ID 8902> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 
McG: Discrim Score: -6.84 
GvH: Signal Score (-7.5): -5.37 

Possible site: 13 
>>> Seems to have no N-terminal signal sequence 
ALOM program count: 1 value: -2.02 threshold: 0.0 

INTEGRAL Likelihood = -2.02 Transmembrane 114 - 130 ( 113 - 131) 
PERIPHERAL Likelihood = 3.61 179 
modified ALOM score: 0.90 



*** Reasoning Step: 3 
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Final Results 

bacterial membrane Certainty=0 . 1808 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

5 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF02556(223 - 987 of 1293) 

EGAD|l3275|BS1617(4 - 255 of 259) cody protein {Bacillus subtilis} OMNI |NT01BS1895 cody 
10 protein (vegetative protein 286b) (veg286b) GP| 535351 |gb|AAB03372 . 1 ] |U13634 CodY {Bacillus 

subtilis} GP|2633989|emb|CAB13490.l| | Z99112 transcriptional regulator {Bacillus subtilis} 
PIR|S61496|S61496 transcription pleiotropic repressor codY - Bacillus subtilis 
%Match =29.1 

%Identity = 50.6 %Similarity =71.5 
15 Matches = 128 Mismatches = 71 Conservative Sub.s = 53 

177 207 237 267 297 327 357 387 

DCKS*NALI*L*RKTYKG*RKCRIYLEKTRKITSILQRSVDSLDAELPYNTMAAQLADIIDCINACIINGGGNLLGYAM^ 

1=111 I 1=11 = ==111 1=11 I === I 1111== 

20 MALLQKTRIINSMLQAAAGK PVNFKEMAETLRDVIDSNIFWSRRGKIiLGYSINQ 

10 20 30 40 50 



25 



417 447 477 507 537 567 597 627 

KTNTDRVEEFFETKQFPDYWKSASRWDTEANLSvDNDLSIFPVETKENFQDGITTIAPIYGGGMRLGTFIIWRNDKEF 

= ll = = : =1 =111= 11= I =1 =11 = = = = = Mil == 1 I 1 = 111 II III 1111 = 1= I =1 
QIENDRMKKMLEDRQFPEEYTKNLFNVPETSSNLDINSEYTAFPVENRDLFQAGLTTIVPIIGGGERLGTLILSRLQDQF 
70 80 90 100 110 120 130 



657 687 717 747 777 807 837 867 

30 SDDDLILVEIASTWGIQLLNLQTENLEENIRKQ/T^ 

=111111 I =llll===l =1=11 I = I |||::|||||::|: 1= llll II I II 1111=11111 
NDDDLI]jAEYGATWGMEILREKAEEIEEEARSKAWQMAISSLSYSELEAIEHIFEELIX3NEGLLVASKIADRVGITRS 
150 160 170 180 190 200 210 

35 897 927 957 987 1017 1047 1077 1107 

VIWALRKLESAGIIESRSLGMKGTYLKVINEGIFDKLKEYN*S*HGTGSSFQFLFWNQEEIRRKMTXXN*LXXLFS*RL 

lllllllllllll=llllllllllll=ll=l = =1= 
VIVNAIjRKLESAGVIESRSLGMKGTYIiFCVLNNKFLIELENLKSH 
230 240 250 

40 SEQ ID 8902 (GBS43 1 ) was expressed in E. coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 173 (lane 7; MW 54kDa). It was also expressed in E.coli as a His-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 77 (lane 6; MW 29kDa). 

GBS431-GST was purified as shown in Figure 223, lane 8. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
45 vaccines or diagnostics. 

Example 1787 

A DNA sequence (GBSxl894) was identified in S.agalactiae <SEQ ID 5559> which encodes the amino 
acid sequence <SEQ ID 5560>. This protein is predicted to be isochorismatase. Analysis of this protein 
sequence reveals the following: 

50 Possible site: 35 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.81 Transmembrane 126 - 142 ( 125 - 142) 

Final Results 

55 bacterial membrane Certainty=0. 2126 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15164 GB:Z99120 similar to pyrazinamidase/nicotinamidase 
[Bacillus subtilis] 
Identities = 99/181 (54%) , Positives = 132/181 (72%) 

5 

Query: 1 MTKMjISIDYTYDFVADDGKLTAGKPAQSIASAIADVTEKAYRSGDYIFFAIDNHDIGDV 60 

M KALI IDYT DFVA DGKLT G+P + I AI ++T++ +GDY+ A+D+HD GD 
Sbjct: 1 MKKALICIDYTNDFVASDGKLTCGEPGRMIEEAI vNLTKEFITNGDYVVIAvDSHDEGDQ 60 

10 Query: 61 FHPESNLFPEHNIKGTSGRNLYGPLGTLYETIKEDSRVFWIDKRHYSAFSGTDLDIRLRE 120 

+HPE+ LFP HNIKGT G++LYG L LY+ + + V++++K YSAF+GTDL+++LRE 
Sbjct: 61 YHPETRLFPPHNIKGTEGKDLYGKLLPLYQKHEHEPNVYYMEKTRYSAFAGTDLELKLRE 120 

Query: 121 RRVDTLILTGVLTDICvLHTAIDAYNLGYKIEVPAAAVASIiNDSlfflQWALNHFKTVLGATI 181 
15 R++ L L GV TDICVLHTA+DAYN G++I V AVAS N H WAL4-HF +GA + 

Sbjct: 121 RQIGELHLAGVCTDICVLHTAvDAYNKGFRIWHKQAVASFNQEGHAWALSHFANSIGAQV 181 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5561> which encodes the amino acid 

sequence <SEQ ID 5562>. Analysis of this protein sequence reveals the following: 

20 Possible site: 31 

, >>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.60 Transmembrane 126 - 142 ( 126 - 142) 

Final Results 

25 bacterial membrane Certainty=0. 2041 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

1 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

30 >GP:CAB15164 GB:Z99120 similar to pyrazinamidase/nicotinamidase 

[Bacillus subtilis] 
Identities = 90/179 (50%) , Positives = 127/179 (70%) 

Query. 3 RALISIDYTNDFVADDGKLSAGKSAQAIATKIAEVTKTAFDQGDYIFFAIDCHDQNDSWH 62 
35 +ALI IDYTNDFVA DGKL+ G+ 4 I I +TK GDY+ A+D HD+ D +H 

Sbjct: 3 KALICIDYTNDWASDGKLTCGEPGRMIEEAIVNLTKEFITNGDYVVLAVDSHDEGDQYH 62 

Query: 63 PESKLFAAHNIKGTTGRHLYGPLAEVYSYMKQHPRVFWIDKRYYSAFSGTDLDIRLRERG 122 
PE++LF HNIKGT G+ LYG L +Y + P V++++K YSAF+GTDL+++LRER 
40 Sbjct: 63 PETRLFPPHNIKGTEGKDLYGKLLPLYQKHEHEPNVYYMEKTRYSAFAGTDLELKLRERQ 122 

Query: 123 ITQLVLTGVLSDICVLHTAIDAYHLGYQLEIVKSAVASLTKESYEWSLAHFEQVLGAKL 181 

I +L L GV +DICVLHTA+DAY+ G+++ + K AVAS +E + W+L+HF +GA++ 
Sbj Ct : 123 IGELHLAGVCTDICVLHTAVDAYNKGFRIWHKQAVASFNQEGHAWALSHFANSIGAQV , 181 

45 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 121/180 (67%) , Positives = 150/180 (83%) 

Query: 3 KALISIDYTYDFVADDGKLTAGKPAQSIASAIADVTEKAYRSGDYIFFAIDNHDIGDVFH 62 
50 +ALISIDYT DFVADDGKL+AGK AQ+IA+ IA+VT+ A+ GDYIFFAID HD D +H 

Sbjct: 3 RALISIDYTNDFVADDGKLSAGKSAQAIATKIAEVTKTAFDQGDYIFFAIDCHDQNDSWH 62 

Query: 63 PESNLFPEHNIKGTSGRNLYGPLGTLYETIKEDSRVFWIDKRHYSAFSGTDLDIRLRERR 122 
PES LF HNIKGT+GR+LYGPL +Y +K+ RVFWIDKR+YSAFSGTDLDIRLRER 
55 Sbjct: 63 PESKLFAAHNIKGTTGRHLYGPLAEVYSYMKQHPRVFWIDKRYYSAFSGTDLDIRLRERG 122 

Query: 123 VDTLILTGVLTDICTLHTAIDAYNLGYKIEVPAAAvASLNDSNHQWALNHFKTVLGATIL 182 

+ L+LTGVL+DICVLHTAIDAY+LGY++E+ +AVASL +++W+L HF+ VLGA ++ 
Sbjct: 123 ITQLVLTGVLSDII^VLHTAIDAYHLGYQLEIVKSAVASLTKESYEWSLAHFEQVLGAKLI 182 



60 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1788 

A DNA sequence (GBSxl895) was identified in S.agalactiae <SEQ ID 5563> which encodes the amino 
acid sequence <SEQ ID 5564>. Analysis of this protein sequence reveals the following: 

Possible site: 16 
5 »■> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1539 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccines or diagnostics. 

Example 1789 

A DNA sequence (GBSxl896) was identified in S.agalactiae <SEQ ID 5565> which encodes the amino 

acid sequence <SEQ ID 5566>. This protein is predicted to be 3-hydroxyacyl-CoA dehydrogenase (hbd- 

10). Analysis of this protein sequence reveals the following: 

20 Possible site: 46 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -0.27 Transmembrane 3 - 19 ( 1 - 19) 
INTEGRAL Likelihood = -0.11 Transmembrane 277 - 293 ( 277 - 294) 

25 Final Results 

bacterial membrane Certainty=0 . 1107 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

30 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF12219 GB:AE001862 3 -hydroxyacyl-CoA dehydrogenase, putative 
[Deinococcus radiodurans] 
Identities = 151/321 (47%) , Positives = 196/321 (61%) , Gaps = 36/321 (11%) 

35 Query: 56 NMTIKNLOTAGSGVLGSQIAFQAAYKGMSOTIYDINDEAI^KGKERIKKLAKVYQSEIET 115 

+M+IK +TV GSGVLGSQIAFQ A+ G V +YDIND A+ K +E + KL YQ +++ 
Sbjct: 51 SMSIKTVTVCGSGVLGSQIAFQTAFHGFDVHLYDINDAAIAKARETLGKLQARYQQDLKV 110 

Query: 116 AKEAYSDKAKSIKYNKNLLPSLDHIFLSKVADSLDLIADLPNQITFSKNLDQAVSDADLV 175 
40 + D +I+F ++ +AV DLV 

Sbjct: 111 DAQQTGDAFA RISFFTDIAEAVKGVDLV 138 

Query: 176 IFAVPETVSIKEDFYKQLAKVAPSKTIFATNSSTLVPSQFADITGRPDKFLAMHFANNIW 235 
IEA+PE + IK FY QL +VA TIFATNSSTL+PSQF + TGRP+KFLA+HFAN IW 
45 Sbjct: 139 IEAIPENMDIKRKFYNQLGEVADPNTIFATNSSTLLPSQFMEETGRPEKFLALHFANEIW 198 

Query: 236 QNNIVEIMGHKGTDDEVIKFALAFSKDIGMVPLHIHKEQPGYILNSILVPFLESALALYY 295 

+ N EIM TDD V + F+KDIGMV L ++KEQ GYILN++LVP L +AL L 
Sbjct: 199 KFNTAEIMRTPRTDDAVFDTWQFAKDIGMVALPMYKECAGYIIjNTLLVPLLGAALELW 258 

50 

Query: 296 DKVSDSETIDKTWKLGTGAPMGPLEILDIIGIDTAYNIMKNYSDTNSDPNSLHAHLAKML 355 

++D +T+DKTW + TGAP GP LD+IG+ T YNI N + ++P S A AK + 
Sbjct: 259 KGIADPQTVDKTWMIATGAPRGPFAFLDVIGLTTPYNI--NMASAETNPGS- -AAAAKYI 314 

55 Query: 356 KEEFIDKGRTGKAAGHGFYDY 376 

KE +IDKG+ GAG GFY Y 
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Sbjct: 315 KENYIDKGKLGTATGEGFYKY 335 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

A related GBS gene <SEQ ID 8903> and protein <SEQ ID 8904> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 3 
SRCFLG: 0 

McG: Length of UR: 20 

Peak Value of UR: 1.55 
Net Charge of CR: 1 
McG: Discrim Score: -0.60 
GvH: Signal Score (-7.5): -3.93 

Possible site: 21 
>>> Seems to have no N-terminal signal sequence 
Amino Acid Composition: calculated from 1 
ALOM program count: 1 value: -0.11 threshold: 0.0 

INTEGRAL Likelihood = -0.11 Transmembrane 221 - 237 ( 221 - 238) 
PERIPHERAL Likelihood =4.61 6 
modified ALOM score: 0.52 
icml HYPID: 7 CFP: 0.104 

*** Reasoning Step: 3 



Final Results 

bacterial membrane — Certainty=0. 1044 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

37.5/60.5% over 278aa 

Archaeoglobus 

fulgidus 

EGAD|l0385l| 3-hydroxyacyl-CoA dehydrogenase Insert characterized OMNl|AF2273 3- 
hydroxyacyl-CoA dehydrogenase (hbd-10) Insert 
characterized 

GP|2648250|gb]AAB88983.l| |AE000948 3-hydroxyacyl-CoA dehydrogenase (hbd-10) Insert 
characterized 

PIR|A69534 |A69534 3-hydroxyacyl-CoA dehydrogenase (hbd-10) homolog - Insert characterized 
ORF01176(475 - 1431 of 1731) 

EGAD 1 103851 |AF2273 (17 - 295 of 668) 3 -hydroxyacyl - CoA dehydrogenase {Archaeoglobus 
fulgidus}OMNI |AF2273 3-hydroxyacyl-CoA dehydrogenase (hbd- 

10) GP| 2648250 |gb|AAB88983 . 1 | |AE000948 3 -hydroxyacyl -CoA dehydrogenase (hbd-10) 

{Archaeoglobus fulgidus}PIR|A69534 |A69534 3 -hydroxyacyl -CoA dehydrogenase (hbd-10) homolog 
- Archaeoglobus fulgidus 
%Match =14.8 

%Identity = 37.5 %Similarity = 60.4 

Matches = 106 Mismatches = 106 Conservative Sub.s = 65 

387 417 447 477 507 537 567 597 

KKRYYFKNNHTIYLLLDISFVTCLSSKTFSNISIGGCNMTIKNLT 

= = II : I h|::| II I I »lh II I : : : I 

MPRRVKQVINMDTOERIKTVAVLGAGLMGHGIAEVCAMAGYNVTMRDIKQEFVDRGM 
10 20 30 40 50 



624 651 681 711 741 771 801 831 

ERIKK-LAKVYQS-EIETAKEAYSDKAKSIKYNKNLLPSLDHIFLSKVM 

11= |||: I :|==|:| I :| : :|::|| ||||||l 

NMIKESLAKLEQKGKIKSAEEVLS RIKPTVDLEEAVKDADLVIE 

70 80 90 100 
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861 891 921 951 981 1011 1041 1071 

AVPETVS I KEDFYKQl^KVAPSKTI FATNS STLVPSQFAM ^ 



5 




120 130 140 150 160 170 180 



1101 1131 1161 1191 1221 1251 1281 1311 

AFSKDIGMVPLHIHKEQPGYIIiNSILVPFIjESAIJ^YyDCTSDSETIDKTWKLGTGAPMGPLEILDIIGIDTAyWIMKNY 



10 




200 210 220 230 240 250 260 



15 



1341 1371 1401 1431 1461 1491 1521 1551 

SDTNSDPNSLHAHIAKMLKEEFIDKGRTGKAAGHGFYDYD*TIKEVR*KSNLFYNSTKE*LHQEQF*NDLKPIDDYYHLS 



AQTIS-PD YEPPKFLEEMVKANKLGRKTGQGFYDWSKGRPQIDSSKATDKINPMDFTFVEINEAVKLVEMGVATPQ 

270 280 290 300 310 320 330 



20 SEQ ID 8904 (GBS1 12) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 28 (lane 5; MW 39kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 34 (lane 3; MW 64kDa). 

GBS1 12-GST was purified as shown in Figure 198, lane 10. 
Example 1790 

25 A DNA sequence (GBSxl897) was identified in S.agalactiae <SEQ ID 5567> which encodes the amino 
acid sequence <SEQ ID 5568>. Analysis of this protein sequence reveals the following: 



35 A related GBS nucleic acid sequence <SEQ ID 10097> which encodes amino acid sequence <SEQ ID 
10098> was also identified. 



Possible site: 14 

>>> Seems to have no N-terminal signal sequence 



30 



Final Results 



bacterial cytoplasm Certainty=0. 3332 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14467 GB:Z99117 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 62/169 (36%) , Positives = 109/169 (63%) , Gaps = 3/169 (1%) 



40 



Query: 1 ^VLSMLGIIDAKPKVGYFYLGQYHASIGTSHFEKMTVSEIMGIPLTVHQKDSVYDVIVH 60 

+A+L+M G ++A+P+VGYFY G+ + +K+ V + IP+ +H+ SVYD I 

Sbjct: 43 LAI LTMSGFLEARPRVGYFYTGKTGTQLLADKLKKLQVKDFQS I P WI HENVSVYDAI CT 102 



45 



Query: 61 IFMEDAGCaFILDDDDFLCGWSRKDLLKISIG^DLSKMPIGMVMTRMPHVTTVLENES 120 

+F+ED G F++D D L GV+SRKDLL+ SIG +L+ +P+ ++MTRMP++T + 
Sbjct: 103 MFLEDVGTLFVVDRDAVLVGVLSRKDLLRASIGQQELTSVPVHIIMTRMPNITVCRREDY 162 



50 



Query: 121 LFAAADKLVSRKVDSLPVVRHDKQYPEKFKV'IGKLSKTILASLFLEIRD 169 

+ A L+ +++D+LPV+ K + F+VIG+++KT + + + + + 
Sbjct: 163 VMDIAKHLIEKQIDALPVI KDTDKGFEVIGRVTKTNMTKILVSLSE 208 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
55 vaccines or diagnostics. 
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Example 1791 

A DNA sequence (GBSxl898) was identified in S.agalactiae <SEQ ID 5569> which encodes the amino 
acid sequence <SEQ ID 5570>. Analysis of this protein sequence reveals the following: 

Possible site: 22 
5 >>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.53 Transmembrane 60 - 76 ( 60 - 76) 

Pinal Results 

bacterial membrane Certainty=0. 1213 (Affirmative) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0.0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB05092 GB:AP001511 unknown conserved protein [Bacillus halodurans] 
15 Identities = 126/256 (49%) , Positives = 183/256 (71%) , Gaps = 1/256 (0%) 

Query: 7 IFIISDSLGETAKAIAKACLSQFPGHDDWHFQRFSYINSQERLEQVFEEASQKOTFMMFS 66 

++++SDS+GETA+ + KA SQF G +R Y+ +E +++V + A Q + F+ 

Sbjct: 10 VYWSDSVGETAEL VVKAAASQFSGAGI - EVRRI PYVEDKETVDE VI QLAKQADAI I AFT 68 

20 

Query: 67 LVDVALASYAQKRCESEHyAYVDLLTNVIQGISRISGIDPLGEPGILRRLDNDYFKRVES 126 

LV + +Y ++ VD++ +++ IS ++ +P EPGI+ RLD DYF++VE+ 

Sbjct: 69 LWPGIRTYLLEKATEAKVETVDIIGPMLEKISSLTKEEPRYEPGIVYRLDEDYFRKVEA 128 

25 Query: 127 IEFAVKYDDGRDPRGILQADLVIIGISRTSKTPLSMFLADKNIKVINIPLVPEVPVPKEL 186 

IEFAVKYDDGRDPRGI++ADLV+IG+SRTSKTPLS +LA K +KV N+PLVPEV P+EL 
Sbjct: 129 IEFAVKYDDGRDPRGIVRM)LvLIGVSRTSKTPLSQYLAHKRLKVANVPLVPEVEPPEEL 188 

Query: 187 RMIDSRRIIGLTNSVDHLNQTOKVRLKSLGLSSTANYASLERILEETRYAEEVMKNLGCP 246 
30 + +++IGL S + LN +R RLK+LGL S ANYA+++RI EE YAE +MK +GCP 

Sbjct: 189 FKLSPKKVIGLKISPEQIiNGIRAERLKTLGIiKSQANYANIDRIKEELAYAEGIMKRIGCP 248 

Query: 247 I INVSDKAIEETATI I 262 
+ 1 +VS + KA+EETA +1 
35 Sbjct: 249 VIDVSNKAVEETANLI 264 

No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 5570 (GBS378) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 68 (lane 4; MW 34kDa). It was also expressed in E.coli as a GST-fusion 
40 product. SDS-PAGE analysis of total cell extract is shown in Figure 72 (lane 2; MW 59kDa). 

GBS378-GST was purified as shown in Figure 212, lane 6. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1792 

45 A DNA sequence (GBSxl899) was identified in S.agalactiae <SEQ ID 5571> which encodes the amino 
acid sequence <SEQ ID 5572>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

>» Seems to have no N-terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty=0. 3703 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD35361 GB:AE001709 pyruvate , orthophosphate dikinase 
[Thermotoga maritima] 
Identities = 494/882 (56%) , Positives = 639/882 (72%) , Gaps = 9/882 (1%) 

5 

Query: 1 METKFVYHFD EGCKEMKELLGGKGANLAEMTSIGLPVPQGFTITTQACNDYYDNAC 56 

M K+VY F EG +MK++LGGKGANLAEMT++G+PVP GFTI+ + C YYD+ 

Sbjct: 1 MAKKYVYFFANGKAEGRADMKDI LGGKGANLAEMTNLGI PVPPGFTI SAEVCKYYYDHGR 60 

10 Query: 57 HIRESILSQIDQA^QLEWQNKQLGSVDDPLLVSTOSGSVFSMPGMMDTVIiNLGLNDRS 116 

E + Q+++AM +LE K+ G ++PLLVSVRSG+ SMPGMMDTVLNLGLND + 
Sbjct: 61 TYPEELKEQVEEAMRRLEEOTGKKFGDPNNPLOTSWSGAAISMPGM^TVXiNLGLNDET 120 

Query: 117 VQGLVKKTEDERFAYDSYRRFIQMFADWTGIPKYKFDTILDRLKTDKCYQDDTELTGSD 176 
15 V+GL K T +ERFAYD+YRRF+QMF DW IP KF+ L+ LK +K + DTEL D 

Sbjct: 121 VKGLAKLTNNERFAYDAYRRFLQMFGDVVLKIPHEKFEKALEELKKEKGVKLDTELDAED 180 

Query: 177 LKRLVEFYKELYQKEAGEKFPQDPKHQLIiIAIFJWFKSWlMPRAKIYRKLNDIPE- -TLG 234 
LK+LVE YK.++Y KE G++FPQDP +QL LAI+AVF SW N RA YR+++ IE LG 
20 Sbjct: 181 LKKLVERYKQIY-KEEGKEFPQDPWKQLWLAIDAVFGS1#MERAIKYRQIHGIKEGDLLG 239 

Query: 235 TAVNIQAMVFGNMGNNSGTGVAFTRNPSTGAANLFGEYLINAQGEDWAGIRTPQSISKL 294 

TAVNI AMVFGNMG +SGTGVAFTR+P+TG +GE+L NAQGEDWAGIRTP + +L 
Sbjct: 240 TAVNIVAMVFGNMGEDSGTGVAFTRDPNTGEKKPYGEFLPNAQGEDWAGIRTPLKLEEL 299 

25 

Query: 295 AEQMPIIYQEFVSVTQKLEAHYRDMQDMEFTIENGNLYMLQTRSGKRTAKAAIKIAVDQV 354 

+MP +Y + + + KLE HYRDMQD+EFT+E G LY+LQTR+GKRT++AAI+IAVD V 
Sbjct: 300 KNRMPEVYNQLLEIMDKLEKBTyRDMQDIEFTVERGKLYILQTRNGKRTSQAAIRIAVDMV 359 

30 Query: 355 NEGLISKEEAILRIEPKQLDQLLHPSFDLKSLKKAIILTTGLPASPGAAYGKVYFHAEDV 414 

+EGLI+KEEAILR+ P+ ++Q+LHP FD K +A ++ GLPASPGAA GKV F+A+ 
Sbjct: 360 HEGLITKEFAILRWPEDVEQVLHPVPDPKEKAQAKVIAKGLPASPGAATGKVVFNAKKA 419 

Query: 415 VKEMKKGNPVI;LTOQETSPEDIEGMVSANGIITARGGMTSHaAVVARGMGKPCVAGCSQL 474 
35 + K G V+LVR ETSPED+ GM +A GI+T+RGGMTSHAAWARGMGKP V G + 

Sbjct: 420 EELGKAGEQVILVRPETSPEDVGGMAAAQGILTSRGGMTSHAAWARGMGKPAWGAESI 479 

Query: 475 LVDEVRREISIGHQTIKEGEMLSIDGATGNVYIGQV-PMAETSVDRDFEIFMKWVDENRD 533 
V +G +KEGE +SIDG TG V +G+V + ++ ++W DE R 

40 Sbjct: 480 EVHPEEGYFKVGDWVKEGEWI S I DGTTGE VLLGKVTTI KPQGLEGPVAELLQWADE I RR 539 

Query: 534 MMVCSNADNPRDAQKALDFGAEGIGLCRTEHMFFDDERIPWREMILADEILSRRKALER 593 

+ V +NAD PRDA+ A FGAEGIGLCRTEHMFF+ +RIP VR MILA R KAL+ 

Sbjct: 540 LGVRTNADIPRDAEVARKFGAEGIGLCRTEHMFFEKDRIPKVRRMILAKTKEEREKALDE 599 

45 

Query: 594 LLSFQRDDFYQIFKVLKGKACTIRLLDPPLHEFLPHDKESIESMARQMGISTLAIEKRIQ 653 

LL Q++DF +F+V+KG TIRL+DPPLHEFLP + E 1+ +A QMG+S ++ ++ 
Sbjct: 600 LLPLQKEDFKGLFRVMKGLPVTIRLIDPPLHEFLPQEDEQIKEVAEQMGVSFEELKNWE 659 

50 Query: 654 TLEEFNPMLGHRGCRLAITYPEIYQMQVRALVQGAI-LAMKEGYEAKPEIMIPLVTAHEE 712 

L+E NPMLGHRGCRL ITYPEI MQ +A++ AI L +EG + PEIMIPLV E 
Sbjct: 660 NLKELNPMLGHRGCRLTITYPEIAVMQTKAIIGAAIELKKEEGIDVIPEIMIPLVGHVNE 719 

Query: 713 ISIIRDLIEETIVEESKSKKINLSFPIGTMIETPRACMIADDIAKFADFFSFGTNDLTQM 772 
55 + ++ +I+ET K + L++ IGTMIE PRA + A IA+ A+FFSFGTNDLTQM 

Sbjct: 720 LRYLHCIIKETADALIKEAGVELTYKIGTMIEVPRAAvTAHQIAEEAEFFSFGTNDLTQM 779 

Query: 773 SFGFSRDDAGKFLGEYVDKGLLKKDPFQVLDQKGIGRFIGQAVRLGKEVKPNLKIGICGE 832 
+FGFSRDD GKFL EY++KG+L+ DPF+ LD G+G + G+ +P+LK+G+CGE 

60 Sbjct: 780 TFGFSRDDVGKFLPEYLEKGILEHDPFKTLDYDGVGELVRMGKEKGRSTRPDLKVGVCGE 839 

Query: 833 HGGEPSSIEFCYQLGLHYVSCSPFRIPIAKLAAAQAKIKQSR 874 

HGG+P SI F ++GL YVSCSP+R+P+A+LAAAQA +K + 
Sbjct: 840 HGGDPRSILFFDKIGLDYVSCSPYRVPVARLAAAQAALKNKK 881 



WO 02/34771 



PCT/GB01/04789 



-2028- 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1793 

A DNA sequence (GBSxl900) was identified in S.agalactiae <SEQ ID 5573> which encodes the amino 
5 acid sequence <SEQ ID 5574>. This protein is predicted to be glutamyl-tRNA (Gin) amidotransferase 
subunit C (gatC). Analysis of this protein sequence reveals the following: 

Possible site: 56 

»> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0 . 3229 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04384 GB:AP001509 glutamyl-tRNA (Gin) amidotransferase 
subunit C [Bacillus halodurans] 
Identities = 42/94 (44%) , Positives = 63/94 (66%) 

20 Query: 2 KISEEEVRHVANLSKLRFSDQETKEFASSLSKIVDMIELLNEVDTEGVPVTTTMADRKTV 61 

+ IS E+V+HVA+L++L +++E K F L 1+ E LNE+DTEGV T+ + D K V 
Sbjct: 3 RI SMEQVKHVAHLARLAITEEEAKLFTEQLGDI IQFAEQLNELDTEG VEPTSHVLDMKNV 62 

Query: 62 MREDIAQPGHNRDDLFKNVPQHQDYYIKTOAILE 95 
25 +RED + G +D+ KN P H+D I+VP++LE 

Sbjct: 63 LREDKPEKGLPVEDVLKNAPDHEDGQIRVPSVLE 96 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5575> which encodes the amino acid 
sequence <SEQ ID 5576>. Analysis of this protein sequence reveals the following: 

30 Possible site: 60 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3247 (Affirmative) < suco 

35 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 72/100 (72%) , Positives = 88/100 (88%) 

40 

Query: 1 MKISEEEVRHVANLSKLRFSDQETKEFASSLSKIvDMIELLNEVDTEGVPVTTTMADRKT 60 

MKI SEEE VRHVA LSKL FS+ ET FA++LSKIVDM+ELLNEVDTEGV +TTTMAD+K 
Sbjct: 5 MKISEEETOHVAKLSKLSFSESETTTFATTLSKIVDMVELLNEVDTEGVAITTTMADKKN 64 

45 Query: 61 VMREDIAQPGHNRDDLFKNVPQHQDYYIKVPAILEDGGDA 100 

VMR+D+A+ G +R LFKNVP+ ++++IKVPAIL+DGGDA 
Sbjct: 65 vMRQDVAEEGTDRALLFKNVPEKENHFIKVPAILDDGGDA 104 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
50 vaccines or diagnostics. 

Example 1794 

A DNA sequence (GBSxl901) was identified in S.agalactiae <SEQ ID 5577> which encodes the amino 
acid sequence <SEQ ID 5578>. Analysis of this protein sequence reveals the following: 
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Possible site: 30 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -7.64 Transmembrane 7 - 23 ( 6-24) 



5 Final Results 

bacterial membrane Certainty=0 .4057 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

10 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1795 

15 A DNA sequence (GBSxl902) was identified in S.agalactiae <SEQ ID 5579> which encodes the amino 
acid sequence <SEQ ID 5580>. This protein is predicted to be glutamyl-tRNA amidotransferase, subunit A 
(gatA). Analysis of this protein sequence reveals the following: 



Possible site: 55 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm --- Certainty=0 . 2855 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04385 GB:AP001509 glutamyl-tRNA (Gin) amidotransferase 
subunit A [Bacillus halodurans] 
Identities = 285/486 (58%) , Positives = 367/486 (74%) , Gaps = 4/486 (0%) 

Query: 1 MSFNNQSIDQLHDFLVKKEISATELTKATLEDIHAREQAVGSFITISDEMAIAQAKEID- 59 

MS + + +H L +KEIS ++L +1 + V +F+ +++E A A AKE+D 

Sbjct: 1 MSLFDLKLKDVHTKLHEKEISVSDLvBFAYKRIEQVDGQvEAFIALNEEKARAYAKELDA 60 

35 Query: 60 --DKGIDADNVMSGIPLAVKDNISTKGILTTAASKMLYNYEPIFDATAVEKLYAKDMIVI 117 

D+ +A ++ GIP+ VKDNI TK + TT +S++L N++PI+DAT V KL +1 
Sbjct: 61 ALDRS-EARGLLFGIPIGVKDNI VTKNLRTTCSSRILGNFDPIYDATVVHKLREAQAVTI 119 

Query: 118 GKANMDEFAMGGSTETSYFKKTNNAWDHSKVPGGSSGGSAAAVASGQVRLSLGSDTGGSI 177 
40 GK NMDEFAMG STE S F+KT N W+ VPGGSSGGSAAAVA+G+V +LGSDTGGSI 

Sbjct: 120 GKLNMDEFAMGSSTENSAFQKTKNPWNLEYVPGGSSGGSAAAVAAGEVPFTLGSDTGGSI 179 

Query: 178 RQPASFNGIVGMKPTYGRVSRFGLFAFGSSLDQIGPMSQTVKENAQLLTVISGHDVRDST 237 
RQPA++ G+VG+KPTYGRVSR+GL AF SSLDQIGP+++ V++NA LL ISGHD DST 
45 Sbjct: 180 RQPAAYCGWGLKPTYGRVSRYGLVAFASSLDQIGPITRNVEDNAYLLQAISGHDPMDST 239 

Query: 238 SSERTVGDFTAKIGQDIQGMKIALPKEYLGEGIAQGVKETIIKAAKHLEKLGAVIEEVSL 297 

S+ V D+ + + DI+G+KIA+PKEYLGEG+ + VK++++ A K LE LGA EEVSL 
Sbjct: 240 SANLDVPDYLSALTGDIKGLKIAVPKEYLGEGVKEEVKQSVLDALKVLEGLGATWEEVSL 299 

50 

Query: 298 PHSKYGVAVYYIVASSEASSNLQRFDGIRYGYRTENYKNLDDIYVNTRSEGFGDEVKRRI 357 

PHSKY +A YY++ASSEAS+NL RFDG+RYG+R++N NL D+Y TR+EGFGDEVKRRI 
Sbjct: 300 PHSKYALATYYLLASSFiASANLARFDGTOYGFRSDNADNLLDIWKQTRAEGFGDEVKRRI 359 

55 Query: 358 MLGTFSLSSGYYDAYYKKAGQVRSLIIQDFEKVFADYDLILGPTAPTTAFDLDSLNHDPV 417 

MLGTF+LSSGYYDAYYKKA QVR+LI QDFEKVF YD+I+GPT PT AF + DP+ 
Sbjct: 360 MLGTFALSSGYYDAYYKKAQQVRTLIKQDFEKVFEQYDVIIGPTTPTPAFKIGEKTDDPL 419 



Query: 418 AMYLADILTIPVNLAGLPGISIPAGFDQGLPVGMQLIGPKFSEETIYQVAAAFEATTDYH 477 
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MY DILTIPVNIAG+P IS+P GFD GLP+G+Q+IG F E ++Y+VA AFE TDYH 
Sbjct: 420 TMYANDILTIPVNLAGVPAISVPCGFDNGLPLGLQIIGKHFDEGSVYRVAHAFEQATDYH 479 

Query: 478 KQQPKI 483 
5 ++P + 

Sbjct: 480 TKRPTL 485 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5581> which encodes the amino acid 
sequence <SEQ ID 5582>. Analysis of this protein sequence reveals the following: 

10 Possible site: 57 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2364 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 392/487 (80%) , Positives = 442/487 (90%) 

20 





Query: 


1 


MSFNNQSIDQLHDFLVKKEISATELTKATLEDIHAREQAVGSFITISDEMAIAQAKEIDD 


60 








M^FM4-+4-T4-4-T,HD T.V TfFTSATF.T.T+ATT.RnT 4-RF4-AVGRFIT+S+E+A+ OA ID 






Sb j ct : 


1 


MSFNHKTIEELHDLLVAKEISATELTQATLEDIKSREEAVGSFITVSEEVALKQAAAIDA 


60 


25 


Query: 


61 


KGIDADNVMSGIPLAVKDNISTKGILTTAASKMLYNYEPIFDATAVEKLYAKDMIVIGKA 


120 








KGIDADN+MSGIPIAVKDNISTK ILTTAASKMLYNYEPIF+AT+V YAKDMIVIGK 






Sbjct: 


61 


KGIDADNLMSGIPIAVTCDNISTKEILTTAASKMLYNYEPIFNATSVANAYAKDMIVIGKT 


120 




Query: 


121 


N^EFAMGGSTETSYFKKTNmWDHSKVPGGSSGGSAaAVASGQVRLSLGSDTGGSIRQP 


180 


30 






NMDEFAMGGSTETSYFKKT NAWDH+KVPGGSSGGSA AYASGQVRLSLGSDTGGS IRQP 






Sb j ct : 


121 


NTCIEFAMGGSTETSYFKKTKNAWDHTKvPGGSSGGSATAVASGQvRLSLGSDTGGSIRQP 


180 




Query: 


181 


ASFNGIVGMKPTYGRVSRFGLFAFGSSLDQIGPMSQTVKENAQLLTVISGHDVRDSTSSE 


240 








A+FN +VG+KPTYG VSR+GL AFGSSLDQIGP + TVKENAQLL VI+ DV+D+TS+ 




35 


Sb j ct : 


181 


AAFNSWGLKPTYGTVSRYGLIAFGSSLDQIGPFAPTVKENAQLLNVIASSDVKDATSAP 


240 




Query: 


241 


RTVGDFTAKIGQDIQGMKIALPKEYLGEGIAQGVKETIIKAAKHLEKLGAVIEEVSLPHS 


300 








+ D+T+KIG+DI+GMKIALPKEYLGEGI +KET++ + K E LGA +EEVSLPHS 




40 


Sb j ct : 


241 


WIADYTSKIGRDIKGMKIALPKEYLGEGIDPEIKETVIASVKQFEALGATVEEVSLPHS 


300 




Query: 


301 


KYGVAVYYIVASSFASSNLQRFDGIRYGYRTENYKNLDDIYVNTRSEGFGDEVKRRIMLG 


360 








KYGVAVYYI +ASSEASSNLQRFDGIRYG+R ++ KRLD+IYVNTRS+GFGDEVKRRIMLG 






Sb j ct : 


301 


KYGVAVYYIIASSFASSNLQRFDGIRYGFFADDAKISnjDEIYVNTRSQGFGDEVKRRIMLG 


360 


45 


Query: 


361 


TFSLSSGYYDAYYKKRGQVRSLIIQDFEKVFADYDLILGPTAPTTAFDLDSLNHDPVAMY 


420 








TFSLSSGYYDAY+KKAGQVR+LIIQDF+KVFADYDLILGPT PT AF LD+LNHDPVAMY 






Sb j ct : 


361 


TFSLSSGYYDAYFKKAGQVRTLIIQDFDKVFADYDLILGPTTPTVAFGLDTLNHDPVAMY 


420 




Query: 


421 


LADILTIPVNjQAGLPGISIPAGFDQGLPVGMQLIGPKFSEETIYQVAAAFEATTDYHKQQ 


480 


50 






IAD+LTIPVNLAGLPGISIPAGF GLPVG+QLIGPK++EETIYQ AAAFEA TDYHKQQ 






Sb j ct : 


421 


IADLLT I PVNLAGLPGI S I PAGFVDGLPVGLQLIGPKYAEETI YQAAAAFEAVTDYHKQQ 


480 




Query: 


481 


PKIFGGE 487 










P IFGG+ 




55 


Sbjct: 


481 


PIIFGGD 487 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1796 

A DNA sequence (GBSxl903) was identified in S.agalactiae <SEQ ID 5583> which encodes the amino 
acid sequence <SEQ ID 5584>. This protein is predicted to be glutamyl-tRNAGln amidotransferase subunit 
B (gatB). Analysis of this protein sequence reveals the following: 

Possible site: 27 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3 93 5 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < succ> 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10095> which encodes amino acid sequence <SEQ ID 
10096> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04386 GB:AP001509 glutamyl -tRNA (Gin) amidotransferase 
subunit B [Bacillus halodurans] 
Identities = 308/476 (64%) , Positives = 361/476 (75%) , Gaps = 1/476 (0%) 



Query: 


1 


MNFEWIGLEVHVEMrn^SKIFSPSSAHFGQEQNANTNVIDWSFPGVIjPvMNKGVIDAGI 


60 






MNFETVIGLEVHVEL T SKIFS S HFG E NANT+VID +PGVLPV+NK ++ + 




Sbjct: 


1 


MNFEWIGLEvHVELKTESKIFSASPNHFGAEPNANTSVIDLGYPGvliPvIiNKAA.VEFAM 


60 


Query: 


61 


KAALA»1DIHQNMHFDRKNYFYPDNPKAYQISQFDEPIGYNGWIEIELEDGTRKKIRIE 


120 






KAA+ALN ++ + FDRKNYFYPDNPKAYQISQFD+PIG NGWIEIE+ DGT+KKI I 




Sbjct: 


61 


KAAMALNCEVATDTKFDRKNYFYPDNPKRYQISQFDKPIGENGWIEIEV-DGTKKKIGIT 


119 


Query: 


121 


RAHLEEDAGKNTHGTDGYSYVDIJilRQGVPLIEIVSEADMRSPEEAYAYLTALKEIIQYTG 


180 






R HLEEDAGK TH +GYS VD NRQG PLIEIVSE D+R+P+EAYAYL LK IIQYTG 




Sb j ct : 


120 


RLHLEEDAGKLTHSGNGYSLVDFNRQGTPLIEIVSEPDIRTPQEAYAYLEKLKSIIQYTG 


179 


Query: 


181 


ISDVKMEEGSMRVDANISLRPYGQEEFGTKaELKNLNSFNNVRKGLIHEEKRQAQVLRSG 


240 






+SD KMEEGS+R DANISLRP GQEEFGTK ELKNLNSFN VRKGL +EEKRQAQVL SG 




Sb j Ct : 


180 


VSDCKMEEGSLRCDANISLRPVGQEEFGTKTELKNLNSFNFVRKGLEYEEKRQAQVLLSG 


239 


Query: 


241 


GQIQQETRRFDETTGETILMRVKEGSSDYRYFPEPDLPLFD1SDEWIDQVRLELPEFPQE 


300 






G+I QETRR+DE +T+LMRVKEGS DYRYFPEPDL I DEW ++R E+PE P 




Sbjct: 


240 


GEILQETRRYDFJU^KTVLMRVKEGSDDYRYFPEPDLVALHIDDEWKARIRSEIPELPDA 


299 


Query: 


301 


RRAKYVSSFGLSSYDASQLTATKATSDFFEKAVAIGGDAKQVSNWLQGEVAQFIiNSESKS 


360 






R+ +YV GL +YDA LT TK SDFFE+ +A G D K SNWL GEV+ +LN+E K 




Sb j ct : 


300 


RKKRYVEELGLPAYDAKVLTLTKEMSDFFEETIAKGADPKLASNWLMGEVSGYLNAEQKE 


359 


Query: 


361 


lEEIGLTPENLVEMIGLIADGTlSSKIAKKVFTOIAKNGGSAEEFVKKAGLVQISDPEVL 


420 






++E+ LTP+ L +MI LI GTISSKIAKKVF L + GG EE VK GLVQISD L 




Sbj Ct : 


360 


LDEVALTPDGLAKMIQLIEKGTISSKIAKKVFKDLIEKGGDPEEIVKAKGLVQISDEGEL 


419 


Query: 


421 


IPIIHQVFADNEAAVIDFKSGKRNADKAFTGYLMKATKGQANPQVALKLLAQELAK 476 








+ +V +N+ ++ DFK+GK A G +MKATKG+ANP + KLL +E+ K 




Sbj ct : 


420 


RKYVVEVLDNNQQSIDDFKNGKDRAIGFLVGQIMKATKGKANPPMVNKLLLEEINK 475 





A related DNA sequence was identified in S.pyogenes <SEQ ID 5585> which encodes the amino acid 
sequence <SEQ ID 5586>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3935 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 <Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 410/479 (85%) , Positives = 447/479 (92%) 

5 Query: 1 ^FEOTIGLEVHWLOTWSKIFSPSSAHFGQEQNftNTNVIDWSFPGVLPVMNKGVIDAGI 60 

MNFET+IGLEVHVELNTNSKIFSPSSAHFG++ NANTNVIDWSFPGVLPVMNKGVIDAGI 
Sbjct: 1 ^FETIIGLEVHWIOTWSKIFSPSSAHFGEDPNAirarVIDWSFPGVLPVMNKGVIDAG 60 

Query: 61 KAALALNMDIHQNMHFDRKNYFYPDNPKAYQISQFDEPIGVNGWIEIELEDGTRKKIRIE 120 
10 KAALALNMDIH+ MHFDRKNYFYPDNPKAYQI SQFDEP IGYNGWI + I +LEDG+ KK1RIE 

Sbjct: 61 KAMAIiNMDIHKEMHFDRKNYFYPDNPKAYQISQFDEPIGYNGWIDIKLEDGSTKKIRIE 120 

Query: 121 RAHLEEDAGKNTHGTDGYSYVDLNRQGVPLIEIVSEADMRSPEEAYAYLTALKEIIQYTG 180 
RAHLEEDAGKNTHGTDGYSYVDLNRQGVPI.IEIVSEADMRSPEEAYAYLTALKEIIQYTG 
15 Sbjct: 121 RAHLEEDAGKNTHGTDGYSYVDLNRQGVPLIEIVSEADMRSPEEAYAYLTALKEIIQYTG 180 

Query: 181 ISDVKMEEGSMRVDANISLRPYGQEEFGTKAELKNLNSFNNVRKGLIHEEKRQAQVLRSG 240 

I SDVKMEEGSMRVDANI SLRPYGQE+FGTK ELKNLNSF+NVRKGL E +RQA++LRSG 
Sbjct: 181 ISDVKMEEGSMRVDANISLRPYGQEQFGTKTELKNLNSFSNVRKGLEFEVERQAKLLRSG 240 

20 

Query: 241 GQIQQETRRFDETTGETILMRVKEGSSDYRYFPEPDLPLFDISDEWIDQVRLELPEFPQE 300 

G I+QETRR+DE TILMRVKEG++DYRYFPEPDLPL++I D WID++R +LP+FP + 
Sbjct: 241 GVIRQETRRYDEANKGTILMRVKEGAADYRYFPEPDLPLYEIDnAWIDEMRAQLPQFPAQ 300 

25 Query: 301 RRAKYVSSFGLSSYDASQLTATKATSDFFEKAVAIGGDAKQVSNWLQGEVAQFLNSESKS 360 

RRAKY GLS+YDASQLTATK SDFFE AV++GGDAKQVSNWLQGEVAQFLN+E K+ 
Sbjct: 301 RRAKYEEELGLSAYDASQLTATKVLSDFFETAVSLGGDAKQVSNWLQGEVAQFLNAEGKT 360 

Query: 361 IEEIGLTPENLVEMIGLIADGTISSKIAKKVFVHLAKNGGSAEEFVKKAGLVQISDPEVL 420 
30 IEEI LTPENLVEMI +IADGTISSK+AKKVFVHLAKNGGSA +V+ KAGLVQI SDP VL 

Sbjct: 361 IEEIALTPENLVEMIAIIADGTISSKMAKKVFTOLAKNGGSARAYVEKAGLVQISDPAVL 420 

Query: 421 I PI IHQWAnNEAAVIDFKSGKRNADKAFTGYLMKATBSQANPQVAL^ 479 
+PIIHQVFADNEAAV DFKSGKRNADKAFTG+LMKATKGQANPQVA +LLAQEL KL+ + 
35 Sbjct: 421 VPIIHQVFADNEAAVADFKSGKRNADKAFTGFLMKATKGQANPQVAQQLLAQELQKLRD 479 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1797 

40 A DNA sequence (GBSxl904) was identified in S.agalactiae <SEQ ID 5587> which encodes the amino 
acid sequence <SEQ ID 5588>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

»> Seems to have an uncleavable N-terra signal seq 





INTEGRAL 


Likelihood 




-7 


,27 


Transmembrane 


108 


- 124 


( 


105 


- 125) 


45 


INTEGRAL 


Likelihood 




-7. 


.27 


Transmembrane 


278 


- 294 


( 


268 


- 301) 




INTEGRAL 


Likelihood 




-6. 


,05 


Transmembrane 


191 


- 207 


( 


188 


- 208) 




INTEGRAL 


Likelihood 




-5. 


,63 


Transmembrane 


219 


- 235 


( 


215 


- 242) 




INTEGRAL 


Likelihood 




-3 


,93 


Transmembrane 


41 


- 57 


( 


39 


- 58) 




INTEGRAL 


Likelihood 




-3 


.88 


Transmembrane 


132 


- 148 


( 


131 


- 150) 


50 


INTEGRAL 


Likelihood 




-3. 


.03 


Transmembrane 


254 


- 270 


( 


253 


- 272) 




INTEGRAL 


Likelihood 




-3 


.03 


Transmembrane 


79 


- 95 


( 


79 


- 95) 



Final Results 

bacterial membrane Certainty=0 .3909 (Affirmative) < suco 

55 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10093> which encodes amino acid sequence <SEQ ID 
10094> was also identified. 

60 The protein has homology with the following sequences in the GENPEPT database. 
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>GP:CAA04271 GB:AJ000733 hypothetical protein [Bacillus megaterium] 
Identities = 102/292 (34%) , Positives = 169/292 (56%) , Gaps = 3/292 (1%) 

TKKEKGTMMTLAAGLAWGISGISGQYLMSH-GVHVNLLTSLRLLITGIFLLSLARSKQKE 54 
+++ G ++ + WG+SG QYL H + L +R+L++G+ LL++A SKQ+ 

SRRAWGLLLVI IGATMWGVSGTVAQYLFQHKSFNAEWL VWRMLVSGLLLLAIA- SKQR- 5 8 

HLVAAWKQPKFLKQVLLFSIFGLVLNQYAFLRAIHLTNAGTATVLQYMAPILILSIVCIL 124 
HAWK + +LLF + G++ QY + AI NA TATVLQY +PI 1+ + + 
10 Sbjct: 59 NIFAIWKTKEERTSLLLFGVIGMLGVQYTYFAAIEAGMAATATVLQYTSPIFIIGYIAVQ 118 



15 



20 



55 



Query: 


6 


Sb j ct : 


1 


Query: 


65 


Sb j ct : 


59 


Query: 


125 


Sb j ct : 


119 


Query: 


185 


Sb j ct : 


179 


Query: 


245 


Sb j ct : 


239 



R+ P E+I++ + I GT+ +AT G L+IT L WG+G+A+T + Y L P +L+ 



+W S V+G GM IGG FS + W + +++S+ A + +1 GT+ A+ +L+ + 



+ A + +LAS EP+S+ L+VL L F + LG + I V L+S + 



No corresponding DNA sequence was identified in S.pyogenes. 

25 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1798 

A DNA sequence (GBSxl905) was identified in S.agalactiae <SEQ ID 5589> which encodes the amino 
acid sequence <SEQ ID 5590>. Analysis of this protein sequence reveals the following: 

30 Possible site: 59 

>>> Seems to have no N-terrainal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2103 (Affirmative) < suco 

35 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10091> which encodes amino acid sequence <SEQ ID 
10092> was also identified. 

40 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14510 GB:Z99117 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 52/153 (33%) , Positives = 88/153 (56%) 

Query: 17 YRPTFvVEAVYDLRAEDLLRHGIRAvTjvDLDNTLIAWNNPDGTAEvEAWLDEMTTADISV 76 
45 + P V+ ++ + E L ++ ++ DLDNTL+ W+ P+ T + W +EM I V 

Sbjct: 6 FLPDEFVKNIFHITPEKLKERNVKGIITDLDNTLVEWDRPNATPRLIEWFEEMKEHGIKV 65 

Query: 77 VWSNNNHARVERAVSRFGVDFVSRRMKPFTRGINMAIERYGFDRDEVIMVGDQLMTDIR 136 
+VSNNN RV+ G+ F+ +A KP + N A+ +++ +++GDQL+TD+ 

50 Sbjct: 66 TIVSNNNERRVKLFSEPLGIPFIYKARKPMGKAFJSIRAVRN^LKICEDCWIGDQLLTDVL 125 

Query: 137 ASHRAGIKSVLVKPIVKSDAWNTKFNRLRERRV 169 

+R G ++LV P+ SD + T+FNR ERR+ 
Sbjct: 126 GGNRNGYHTILWPVASSDGFITRFNRQVERRI 158 



A related DNA sequence was identified in S.pyogenes <SEQ ID 5591> which encodes the amino acid 
sequence <SEQ ID 5592>. Analysis of this protein sequence reveals the following: 
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Possible site: 51 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4252 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 147/175 (84%) , Positives = 158/175 (90%) 

Query: 12 LSIDDYRPTFVVEAVYDLRAEDLLRHGIRAVLVDLDNTLIAWNNPDGTAEVRAWLDEMTT 71 

+SIDDYRPT++VEA+YDLRA DLLRHGI AVLVDLDNTL IAWNNPDGT EVRAWLDEMT 
Sbjct: 20 MSIDDYRPTYMVEAIYDLRANDLLRHGITAVIiVDLDNTLIAWNNPDGTPEvRAWLDEMTI 79 

Query: 72 ADISVWVSNNNHARVERAVSRFGVDFVSRAMKPFTRGINMAIERYGFDRDEVIMVGDQL 131 

ADISWWSNN H+RVERAVSRFGVDF+SRA+KPF GI AI RYGFDR+EVIMVGDQL 
Sbjct: 80 ADISWWSNNKHSRVERAVSRFGVDFISRALKPFAYGIEKAIARYGFDRNEVIMVGDQL 139 

Query: 132 MTDIRASHRAGIKSVLVKPIVKSDAWNTKFNRLRERRVWKKIEENYGKIVYQKGI 186 

MTDIRASHRAGIKSVLVKP+V SDAWNTK NR RERRV K+EE YGK+ YQKGI 
Sbjct: 140 MTDIRASHRAGIKSVLVKPLVASDAWNTKINRWRERRVMAKLEEKYGKLSYQKGI 194 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1799 

A DNA sequence (GBSxl906) was identified in S.agalactiae <SEQ ID 5593> which encodes the amino 
acid sequence <SEQ ID 5594>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1091 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14509 GB:Z99117 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 220/373 (58%) , Positives = 280/373 (74%) , Gaps = 8/373 (2%) 



Query: 


1 


MEELFCIGCGARIQTENKDAAGYTPRAALEKGLETGELYCQRCFRLRHYNEITDVHITDD 


60 






ME++ CIGCG IQTE+K GY P A+L K + CQRCFRL++YNEI DV +TDD 




Sb j ct : 


1 


MEKWCIGCGVTIQTEDKTGLGYAPPASLTKE NVICQRCFRLKNYNEIQDVSLTDD 


56 


Query: 


61 


EFLKLLHEVGDSDALVVNVIDIFDFNGSIIPGLSRFVAGNDVLLVGNKKDILPKSVKDGK 


120 






+FL +LH +G++D+LW ++DIFDFNGS I GIi R V GN +LLVGNK DILPKS+K + 




Sb j ct : 


57 


DFLNILHGIGETDSLWKIVDIFDFNGSWINGLQRLVGGNPILLVGNKADILPKSLKRER 


116 


Query: 


121 


VTQWLTERAHEEGLRPVDVILTSAQNHHAIKDLIDTIEKYRHGQDVYWGVTNVGKSTLI 


180 






+ QW+ A E GL+PVDV L SA I+++ID IE YR+G+DVYWG TNVGKST I 




Sb j ct : 


117 


LIQTOKREAKELGLKPvnVFLVSAGRGC^IREVIDAIEHYRNGKDVYWGCTNVGKSTFI 


176 


Query: 


181 


NAIIREITGSRDVITTSRFPGTTLDKIEIPLDDGSYIFDTPGIIHRHQMAHYLTAKNLKY 


240 






N II+E++G D+ITTS+FPGTTLD IEIPLDDGS ++DTPGII+ HQMAHY+ K+LK 




Sbjct: 


177 


NRIIKEVSGEEDIITTSQFPGTTLDAIEIPLDDGSSLYDTPGIINNHQMAHYVNKKDLKI 


236 


Query: 


241 


VSPKKEIKPKTYQLNSEQTLFIAGLARFDFISGQKQGFTAYFDNNLNLHRTKLVGADEFY 


300 






+SPKKE+KP+T+QLN +QTL+ GLARFD++SG++ F Y N L +HRTKL AD Y 




Sbjct: 


237 


LSPKKELKPRTFQLNDQQTLYFGGIARFDWSGERSPFICYMPNELMIHRTKLENADALY 


296 


Query: 


301 


TKHVGKLLTPPTGKEVSDFPKLVRHEFTIKD-KMDIWSGLGWIRWSFJVFJSIPVVVAAWA 


359 
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KH G+LLTPP E+ +FP+LV H FTIKD K DIV+SGLGW+ V + V A+A 
Sbjct: 297 EKIffiGELLTPPGKDEMDEFPELVAHTFTIKDKKTDIVFSGLGWVTVHDADKK VTAYA 353 

Query: 360 PEGVAWLRKALI 372 
5 P+GV V +R++LI 

Sbjct: 354 PKGVHVFVRRSLI 366 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5595> which encodes the amino acid 
sequence <SEQ ID 5596>. Analysis of this protein sequence reveals the following: 

10 Possible site: 15 

>>> Seems to have an uncleavable N-term signal seq 

Final Results 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

20 >GP:CAB14509 GB:Z99117 similar to hypothetical proteins [Bacillus subtilis] 

Identities = 220/373 (58%) , Positives = 286/373 (75%) , Gaps = 8/373 (2%) 

Query: 1 MEELFCIGCGIQIQTEDKEKAGFTPAAALKKGMETGELYCQRCFRLRHYNEITDVHITDD 60 
ME++ CIGCG+ IQTEDK G+ P A+L K + CQRCFRL++YNEI DV +TDD 

25 Sbjct: 1 MEKWCIGCGVTIQTEDKTGLGYAPPASLTKE NVICQRCFRLKNYNEIQDVSLTDD 56 

Query: 61 EFLRLLHEVGDSDALWNVIDIFDFNGSIIPGIiSRFISGNDVLLVGNKKDILPKSVKDGK 120, 

+FL +LH +G++D+LW ++DIFDFNGS I GL R + GN +LLVGNK DILPKS+K + 
Sbjct: 57 DFLNILHGIGETDSLVVTCIVDIFDFNGSWINGLQRIiVGGNPILLVGNKADILPKSLKRER 116 

30 

Query: 121 VTQWLTERAHEEGLRPLDvMLTSRQNKXAIKDLIGRINELRNGRDVYWGvTOVGKSTLI 180 

+ QW+ A E GL+P+DV L SA I+++I I RNG+DVYWG TNVGKST I 

Sbjct: 117 LIQWMKREAKELGLKPVDVFLVSAGRGQGIREVIDAIEHYRNGKDVYWGCTNVGKSTFI 176 

35 Query: 181 NAIIQEITGNKDVITTSRFPGTTLDKIEIPLDDGTFIFDTPGIIHRHQMAHYLSPKELKI 240 

N II+E++G +D+ITTS+FPGTTLD IEIPLDDG+ ++DTPGII+ HQMAHY++ K+LKI 
Sbjct: 177 NRIIKEVSGEEDIITTSQFPGTTLDAIEIPLDDGSSLYDTPGIINNHQMAHYVNKKDLKI 236 

Query: 241 VSPKKEIKPKTYQLNPEQTLFLGGIARFDFINGERQGFTAFFDNQLELHRTKLAGADAFY 300 
40 +SPKKE+KP+T+QLN +QTL+ GGLARFD+++GER F + N+L +HRTKL ADA Y 

Sbjct: 237 LSPKKELKPRTFQIjNDQQTLYFGGIiARFDYVSGERSPFICYMPNELMIHRTKLENADALY 296 

Query: 301 DKHVGTLLTPPDKKELTAFPKLVRHEFTI-DQKMDIVFSGLGWIRVNGQKDSKAIVAAWA 359 
+KH G LLTPP K E+ FP+LV H FTI D+K D1VFSGLGW+ V+ D+ V A+A 
45 Sbjct: 297 EKHAGELLTPPGKDEMDEFPELVAHTFTIKDKKTDIVFSGLGWVTVH DADKKVTAYA 353 

Query: 360 PEGVAVIVRKAII 372 

P+GV V VR+++I 
Sbjct: 354 PKGVHVFVRRSLI 366 

50 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 308/372 (82%) , Positives = 343/372 (91%) 

Query: 1 MEELFCIGCGARIQTENKDAAGYTPRAALEKGLETGELYCQRCFRLRHYNEITDVHITDD 60 
55 MEELFCIGCG +IQTE+K+ AG+TP AAL+KG+ETGELYCQRCFRLRHYNEITDVHITDD 

Sbjct: 1 MEELFCIGCGIQIQTEDKEKAGFTPAAALKKGMETGELYCQRCFRLRHYNEITDVHITDD 60 

Query: 61 EFLKLLHEVGDSDALWNVID I FDFNGS 1 1 PGLSRFVAGNDVLLVGNKKDILPKS VKDGK 120 
EFL+LLHEVGDSDALWNVIDIFDFNGSIIPGLSRF++GNDVLLVGNKKDILPKSVKDGK 
60 Sbjct: 61 EFLRLLHEVGDSDALVVNVIDIFDETilGSIIPGLSRFISGNDVLLVGNKKDILPKSVKDGK 120 

Query: 121 VTQWLTERAHEEGLRPVDVILTSAQNHHAIKDLIDTIEKYRHGQDVYVVGVTNVGKSTLI 180 

VTQWLTERAHEEGLRP+DV+LTSAQN +AIKDLI I + R+G+DVYWGVTNVGKSTLI 
Sbjct: 121 VTQWLTERAHEEGLRPLDVMLTSAQNKYAIKDLIGRINELRNGRDVYWGVTNVGKSTLI 180 
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Query: 181 NAIIREITGSRDVITTSRFPGTTLDKIEIPLDDGSYIFDTPGIIHRHQMAHYLTAKNLKY 240 

NAII+EITG++DVITTSRFPGTTLDKIEIPLDDG++IFDTPGIIHRHQMAHyL+ K LK 
Sbjct: 181 NAIIQEITGNKDVITTSRFPGTTLDKIEIPLDDGTFIFDTPGIIHRHQMAHYLSPKELKI 240 

5 

Query: 241 VSPKKEIKPKTYQLNSEQTLFIAGI^FDFISGQKQGFTAYFDfraiilttHRTKIiVGADEFY 300 

VSPKKEIKPKTYQLN EQTLFL GLARFDFI+G++QGFTA+FDN L LHRTKL GAD FY 
Sbjct: 241 VSPKKEIKPKTYQLNPEQTLFLGGLARFDFINGERQGFTAFFDNQLELHRTKLAGADAFY 300 

10 Query: 301 TKHVGKLLTPPTGKEVSDFPKLVRHEFTIKDKMDIVYSGLGWIRVKSEAENPVWAAWAP 360 

KHVG LLTPP KE++ FPKLVRHEFTI KMDIV+SGLGWIRV + ++ +VAAWAP 
Sbjct: 301 DKHVGTLLTPPDKKELTAFPKLVRHEFTIDQKMDIVFSGLGWIRVNGQKDSKAIVAAWAP 360 

Query: 361 EGVAWLRKALI 372 
15 EGVAV++RKA+I 

Sbjct: 361 EGVAVIVRKAII 3 72 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

20 Example 1800 

A DNA sequence (GBSxl907) was identified in S.agalactiae <SEQ ID 5597> which encodes the amino 
acid sequence <SEQ ID 5598>. Analysis of this protein sequence reveals the following: 



25 



30 



35 



Possible site: 18 

»> Seems to have no N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2948 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14507 GB:Z99117 similar to dihydrodipicolinate reductase 
[Bacillus subtilis] 
Identities = 49/97 (50%) , Positives = 67/97 (68%) , Gaps = 2/97 (2%) 

Query: 1 MLTSKQRAFLKSEAHSMKPIIQIGKNGLNDQIKTSVRNALDARELIKVTLLQNTDEDIHD 60 

MLT KQ+ FL+S+AH + PI Q+GK G+ND + + AL+ARELIKV++LQN +ED +D 
Sbjct: 1 MLTGKQKRFLRSKAHHLTPIFQVGKGGVNDNMIKQIAEALEARELIKVSVLQNCEEDKND 60 

40 Query: 61 VAEVLEDEIGCDTVLKIGRILILYKESARKENRKISV 97 

VAE L V IG ++LYKES KEN++I + 

Sbjct: 61 VAEALVKGSRSQLVQTIGNTIVLYKES- -KENKQIEL 95 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5599> which encodes the amino acid 
45 sequence <SEQ ID 5600>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

>» Seems to have no N-terminal signal sequence 

Final Results 

50 bacterial cytoplasm Certainty=0. 283 9 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

55 Identities = 89/102 (87%) , Positives = 98/102 (95%) 

Query: 1 MLTSKQRAFLKSEAHSMKPIIQIGKNGETOQIKTSVRNALDARELIKVTLLQNTDEDIHD 60 

MLTSKQRAFLKSEAHS+KPI+QIGKNGIiND IKTS+R ALDARELIKVTLLQNTDEDIH+ 
Sbjct: 1 MLTSKQRAFLKSEAHSLKPIVQIGKNGIiNDHIKTSIRQALDARELIKvTLLQNTDEDIHE 60 



WO 02/34771 



PCT/GB01/04789 



-2037- 



Query: 61 VAEVLEDEIGCDTVLKIGRILILYKESARKENRKISVKVKAV 102 

VAE+LE+EIGCDTVLKIGRILILYK SA+KENRK+S KVKA+ 
Sbjct: 61 ViUSILEEEIGCDTVXiKIGRILILYKVSAKKENRKLSPKVKBiI 102 

5 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1801 

A DNA sequence (GBSxl908) was identified in S.agalactiae <SEQ ID 5601> which encodes the amino 
10 acid sequence <SEQ ID 5602>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.66 Transmembrane 3 - 19 { 1 - 21) 

15 Final Results 

bacterial membrane Certainty=0. 2062 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

20 A related GBS nucleic acid sequence <SEQ ID 10089> which encodes amino acid sequence <SEQ ID 
10090> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 
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>GP:CAB14506 GB:Z99117 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 85/187 (45%) , Positives = 134/187 (71%) 

Query: 38 KQIGIMGGNENPVHNAHLVVADCjVRQQLCLDQVLMPEPQPPHIDKKETIDEQHRLKMLE 97 

K+IGI GG F+P HN HL++A++V Q LD++ MP PPH ++ D HR++ML+ 
Sbjct: 2 KKIGIFGGTFDPPHNGHDLMANEVLYQAGLDEIWEMPNQIPPHKQNEDYTDSFHRVEMLK 61 

30 Query: 98 LAIEGIDGLSIEPIEIERKGISYTYDTMKLLIEKNPDVDYYFIIGADMVEYLPKWHRIDE 157 

LAI+ +E +E+ER+G SYT+DT+ LL ++ P+ +FIIGADM+EYLPKW+++DE 

Sbjct: 62 LAIQSNPSFKLELVEMEREGPSYTFDTVSLLKQRYPNDQLFFIIGADMIEYLPKWYKLDE 121 

Query: 158 LVKMVQFVGVQRPKYKAGTSYPVIWVDLPLMDISSSMIRQFIKSNRQPNYLLPREVLDYI 217 
35 L+ ++QF+GV+RP + T YP+++ D+P ++SS+MIR+ KS + +YL+P +V Y+ 

Sbjct: 122 LLNLIQFIGVKRPGFHVETPYPLLFADVPEFEVSSTMIRERFKSKKPTDYLIPDKVKKYV 181 

Query: 218 RKEGLYK 224 
, + GLY+ 

40 Sbjct: 182 EENGLYE 188 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5603> which encodes the amino acid 
sequence <SEQ ID 5604>. Analysis of this protein sequence reveals the following: 

Possible site: 44 
45 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4660 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 172/210 (81%) , Positives = 196/210 (92%) 

55 Query: 15 MALELLTPFTKVELEEK1<RDTNRKQIGIMGGNFNPVHNAHLWADQVRQQLCLDQVLLMP 74 

MALELLTPFTKVELEE+K+++NRKQIGI+GGNFNP+HNAHLvVADQvRQQL LDQVLLMP 
Sbjct: 1 MALELLTPFTKVELEEEKKESNRKQIGILGGNB'NPIHNAHLWADQVRQQLGLDQVLLMP 60 
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Query: 75 EFQPPHIDKKETIDEQHRLKMLELAIEGIDGLSIEPIEIERKGISYTYDTMKLLIEKNPD 134 

E +PPH+D KETIDE+HRL+MLELAIE ++GL+IE E+ER+GI SYTYDTM L E++PD 
Sbjct: 61 ECKPPHVDAKETIDEKHRLRMLELAIEDVEGLAIETCELERQGISYTYDTMLYLTEQHPD 120 

5 

Query: 135 VDYYFIIGADMVEYLPKWHRIDELVKMVQEVGVQRPKYKAGTSYPVIWVDLPLMDISSSM 194 

VD+YFIIGADMV+YLPKWHRIDELVK+VQFVGVQRPKYKAGTSYPVIWVDLPL+DISSSM 
Sbjct: 121 VDFYFIIGADMVDYLPKWHRIDELVKLVQFVGVQRPKYKAGTSYPVIWVDLPLIDISSSM 180 

10 Query: 195 IRQFIKSNRQPNYLLPREVLDYIRKEGLYK 224 

IR FIK RQPNYLLP+ VLDYI +EGLY+ 
Sbjct: 181 I RDFI KKGRQPNYLLPKRVLDYI TQEGLYQ 210 

SEQ ID 5602 (GBS651) was expressed mE.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
15 extract is shown in Figure 132 (lane 8-10; MW 53.3kDa) and in Figure 186 (lane 8; MW 53kDa). It was 
also expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 
132 (lane 12; MW 28.4kDa) and in Figure 140 (lane 11; MW 20kDa). 

Purified GBS651-GST is shown in Figure 243, lane 4; purified GBS651-His is shown in Fig.229, lane 9. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
20 vaccines or diagnostics. 

Example 1802 

A DNA sequence (GBSxl909) was identified in S.agalactiae <SEQ ID 5605> which encodes the amino 
acid sequence <SEQ ID 5606>. Analysis of this protein sequence reveals the following: 

Possible site: 32 
25 >>> Seems to have no N- terminal signal sequence 

, Final Results 

bacterial cytoplasm Certainty=0 .4281 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 
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>GP:CAB14505 GB:Z99117 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 79/180 (43%) , Positives = 115/180 (63%) 

Query: 9 LDRTELLSKTOHmSDKKFNHvLGvERAAIELAERYGYDKEKAGLAALLHDYAKELSDDE 68 

++R E L+ V+ +++ R+ H +GV AIELAER+G D +KA +AA+ HDYAK +E 
Sbjct: 1 MNREEAIACVKQQLTEHRYIHWGVMNTAIELAERFGftDSKKAEIAAIFHDYAKFRPKEE 60 

40. Query: 69 FLRLIDKYQPDPDLKKWGNNIWHGLVGIYKIQEDLAIKDQDILAAIAKHTVGSAQMSTLD 128 

++I + + L +WH VG Y +Q + ++D+DIL AI HT G M+ L+ 

Sbjct: 61 MKQIIAREKMPAHLLDHNPELWHAPVGAYLVQREAGVQDEDILDAIRYHTSGRPGMTLLE 120 

Query: 129 KIVYVADYIEHNRDFPGVEEARELAKVDLNKAVAYETARTVAFLASKAQPIYPKTIETYN 188 
45 K++YVADYIE NR FPGV+E R+LA+ DLN+A+ T+ FL K QP++P T TYN 

Sbjct: 121 KVIYVADYIEPNRAFPGVDETOKIjAETDIjNQALIQSIKNTMVFLMKKNQPVFPDTFLTYN 180 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5607> which encodes the amino acid 
sequence <SEQ ID 5608>. Analysis of this protein sequence reveals the following: 

50 Possible site: 41 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2615 (Affirmative) < suco 

55 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 130/194 (67%) , Positives = 159/194 (81%) 

5 Query: 1 MTYKDYTGLDRTELLSKTOHINMSDKRFNHVLGVERAAIELAERYGYDKEKA^ 60 

MTY+DY RTELL+K+ MS KRF HVLGVE4AA+ LAE YG + +KAGLAALLHDY 
Sbjct: 1 MTYEDYLPYSRTELIAKIAEQMSPKRFKHVLGVEKAALSIAECYGCNPDKAGIAALLHDY 60 

Query: 61 AKELSDDEFLRLIDKyQPDPDLKKWGmiWHGLVGIYKIQEDIAIKDQDILAAIAKHTVG 120 
10 AKE D FL LIDKYQ P+L KW NN+WHG+VGIYKIQEDL +KD+DIL AI HTVG 

Sbjct: 61 AKECPDQVFLDLIDK^QLSPEIAKMTONVWHGMVGIYKIQEDLGLKDKDILRAIEIHTVG 120 

Query: 121 SAQMSTLDKIvYVADYIEHNRDFPGVEEARELAKVDLNKAVAYETARTVAFlASKAQPIY 180 
+A+M+ LDK++YVADYIE R FP V++AR++AK+DLN+AVAYET TVA+LASKAQPI + 
15 Sbjct: 121 AAEMTLLDKVLYVADYIEEGRIFPLVDDARKIAKLDnNQAVAYETWTVAYLASKAQPIF 180 

Query: 181 PKTIETYNAYIPYL 194 

P+T++TYNA+ YL 
Sbjct: 181 PQTLDTYNAFCSYL 194 

20 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1803 

A DNA sequence (GBSxl910) was identified in S.agalactiae <SEQ ID 5609> which encodes the amino 
25 acid sequence <SEQ ID 5610>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -2.34 Transmembrane 12 - 28 ( 10 - 28) 

30 , Final Results 

bacterial membrane Certainty=0 . 1935 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

35 A related GBS nucleic acid sequence <SEQ ID 10087> which encodes amino acid sequence <SEQ ID 
10088> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 
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>GP:AAG19496 GB:AE005041 VngllOOc [Halobacterium sp. NRC-1] 
Identities = 46/175 (26%) , Positives = 82/175 (46%) , Gaps = 12/175 (6%) 

Query: 22 ALLLIDIQQGIMDKK- - PKHLTNFAVLLDDLLLSAKGSNCEVIWIRHHDKE LPQGS 75 

AL+L+D QQG D ++ + ++LL + + + + +RH+ E L QG 

Sbjct: 7 ALVLVDFQCGFADPAWGDRNNPDAEAHAEELLAAWRDAAAPIAHVRHNSTEATSPLRQGE 66 

45 Query: 76 PQWEIWEQRHLOTHHKIIDKTYNSCFKDTHLHDYLQSKHISQLIMMGLQTEYCFDTSVKV 135 

P + + K+ N F DT L +L+ + L++ GL T++C T+V++ 

Sbjct: 67 PGFAYTDGIAPAADEPEWKSWGAFVDTALEGWLRDRDTGSLVVCGLTTDHCVSTTVRM 126 

Query: 136 AFEYGYDI FI PQGGHLTFDTPTLSGDSIKK HYENIWHHR--FATMVAKDSLL 185 

50 A G+D+ + + TDTLG++ H +HR FAT+ ++L 

Sbjct: 127 ADNRGFDVTLVRDATATHDR-TLDGERLPPSVVHRTALAHLRGEFATLATTATVL 180 

No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 5610 (GBS652) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
55 extract is shown in Figure 133 (lane 2 & 3; MW 49.7kDa) + lane 4; MW 27kDa) and in Figure 186 (lane 9; 
MW 50kDa). It was also expressed in E.coli as a EKs-fusion product. SDS-PAGE analysis of total cell 
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extract is shown in Figure 133 (lane 5 & 7; MW 24.8kDa) and in Figure 178 (lane 10; MW 25kDa). 
Purified GBS652-GST is shown in Figure 243, lane 9; purified GBS652-His is shown in Figure 229, lane 
10. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
5 vaccines or diagnostics. 

Example 1804 

A DNA sequence (GBSxl911) was identified in S.agalactiae <SEQ ID 5611> which encodes the amino 
acid sequence <SEQ ID 5612>. Analysis of this protein sequence reveals the following: 

Possible site: 34 
10 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0945 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 
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>GP:CAB14504 GB:Z99117 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 55/118 (46%), Positives = 82/118 (68%) 

Query: 1 MTEKDLLQLWKAADEKRAEDIVILDLQPVTSVADYFVIMSASNSRQLEAIADNIREQVK 60 

M +K +L++ A D+KRAEDI+ LD++ ++ VADYF+I ++ +Q++AIA I++Q 
Sbjct: 1 MNQKS I LKI AAAACDDKRAEDILALDMEGISLVADYFLI CHGNSDKQVQAIARE I KDQAD 60 

25 Query: 61 GNGGDASHLEGDSKAGWVLLDIiNSVWHIFSEDERQHYNLEKLWHEAPLLDAEVFMTE 118 

NG +EG +A WVL+DL WVH+F +DER +YNLEKLW +APL D + M + 

Sbjct: 61 ENGIQVKKMEGFDEARWVLVDLGDWVHVFHKDERSYYNLEKIjWGDAPLADLDFGMNQ 118 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5613> which encodes the amino acid 
30 sequence <SEQ ID 5614>. Analysis of this protein sequence reveals the following: 

Possible site: 50 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.69 Transmembrane 91 - 107 ( 91 - 107) 

35 Final Results 

bacterial membrane Certainty=0 . 1277 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the databases: 

>GP:CAB14504 GB:Z99117 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 55/113 (48%) , Positives = 80/113 (70%) 

Query: 17 MKKEELLKIVVEATEEKRAKDILALDLEGLTSLTDYFVIASATNSRQLEAIADNIREKVK 76 
45 M ++ +LKI A ++KRA+DILALD+EG++ + DYF+I + +Q++AIA I+++ 

Sbjct: 1 MNQKS ILKI AAAACDDKRAEDI LALDMEGISL VADYFL I CHGNSDKQVQAIARE I KDQAD 60 

Query: 77 EAGGDASHVEGNSQAGWVLLDLTDVVVHLFLEDERYHYNLEKLWHEAPAVALD 129 
E G +EG +A WVL+DL DVWH+F +DER +YNLEKLW +AP LD 

50 Sbjct: 61 ENGIQVKKlffiGFDEARWVLVDLGDvVVHVFHKDERSYYNLEKLWGDAPLADLD 113 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 78/116 (67%) , Positives = 100/116 (85%) 
55 Query: 1 MTEKDLLQLVVKAADEKRAEDIVILDLQPVTSVADYFVIMSASNSRQLEAIADNIREQVK 60 



WO 02/34771 



PCT/GB01/04789 



-2041- 

M +++LL++W+A +EKRA+DI+ LDL+ +TS+ DYFVI SA+NSRQLEAIADNIRE+VK 
Sbjct: 17 MKKEELLKIVVEATEEKRAKDILALDLEGLTSLTDYFVIASATNSRQLEAIADNIREKVK 76 

Query: 61 GNGGDASHLEGDSKAGWVLLDLNSWVHIFSEDERQHYNLEKLWHEAPLLDAEVFM 116 
5 GGDASH+EG+S+AGWVLLDL WVH+F EDER HYNLEKLWHEAP + + ++ 

Sbjct: 77 EAGGDASHVEGNSQAGWVLLDLTDVVVHLFLEDERYHYNLEKLWHEAPAVALDAYL 132 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

10 Example 1805 

A DNA sequence (GBSxl912) was identified in S.agalactiae <SEQ ID 5615> which encodes the amino 
acid sequence <SEQ ID 5616>. Analysis of this protein sequence reveals the following: 



15 



20 



30 



35 



Possible site: 19 

>>> Seems to have no N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2415 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Mot Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

25 Example 1806 

A DNA sequence (GBSxl913) was identified in S.agalactiae <SEQ ID 5617> which encodes the amino 
acid sequence <SEQ ID 561 8>. Analysis of this protein sequence reveals the following: 



Possible site: 21 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1570 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14503 GB:Z99117 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 86/242 (35%) , Positives = 154/242 (63%) , Gaps = 4/242 (1%) 

40 Query: 4 YETFAAVYDAvMDDTLYAKWTDFSLRHFPKGKKKLLEIACGTGIQSVRFAQAGYAVTGLD 63 

Y+ FA+VYD +M Y +WT + P+ K ++L+LACGTG S+R A+ G+ VTG+D 

Sbjct: 3 YQGFASVYDELMSHAPYDQWTKWIEASLPE-KGRILDLACGTGEISIRLAEKGFEVTGID 61 

Query: 64 LSGDMLKIAKKRATSAHQSIQFIEGNMLDLSNV-GKYDLITCYSDSICYMQDEVEVGDVF 122 
45 LS +ML A+++ +S+ Q I F++ +M +++ G++D + DS+ Y++ + +V + F 

Sbjct: 62 LSEEMLSFAQQKVSSS-QPILFLQQDMREITGFDGQFDAWICCDSLNYLKTKNDVIETF 120 

Query: 123 IEWKALEENGVFIFDVHSTYQTDKVFPGYSYHENADDFANWWDTYEDDAPHSIVHELTF 182 
V++ L+ G+ +FDVHS+++ +VFP ++ + +D + +W ++ S++H+++F 
50 Sbjct: 121 KSVFRVLKPEGILLFDVHSSFKIAEVFPDSTFADQDEDISYIWQSFAGSDELSVIHDMSF 180 

Query: 183 FVQEEDGRFTRHDEWEERTYDILTYDILLE^GFKDVKVYADFEDKKPTATSARWFFVA 242 

FV + + R DE HE+RT+ + Y+ +L+ GF+ +V ADF D +P+A S R FF A 
Sbjct: 181 FVWNGEA-YDRFDETHEQRTFPVEEYEEMLKNCGFQLHRVTADFTDTEPSAQSERLFFKA 239 
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10 



15 



55 



Query: 243 HK 244 
K 

Sbjct: 240 QK 241 



A related DNA sequence was identified in S.pyogenes <SEQ ID 561 9> which encodes the amino acid 
sequence <SEQ ID 5620>. Analysis of this protein sequence reveals the following: 



Possible site: 53 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 2315 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 191/243 (78%) , Positives = 215/243 (87%) , Gaps = 2/243 (0%) 

Query: 4 YETFAAVYDAVMDDTLYAKWTDFSLRHFPK--GKKKI,LELACGTGIQSTOFAQAGYAVTG 61 
20 YE FA+VYDAVMDD+LY WTDFSLRH PK G+ +LLELACGTG I QSVRFAQAG+ VTG 

Sbjct: 21, YEKFASVYDAVMDDSLYDLWTDFSLRHLPKSKGRNRLLELACGTGIQSVRFAQAGFDVTG 80 

Query: 62 LDLSGDMLKLAKKRATSAHQSIQFIEGNMLDLSNVGKYDLITCYSDSICYMQDEVEVGDV 121 
LDLS DML + AKKRA SA + I FI+GNMLDLS VG++D +TCYSDS I CYMQDEV+VGDV 
25 Sbjct: 81 LDLSQDMIAIAKKRAQSAKKKIDFIQGNMLDLSQVGQFDFVTCYSDSICYMQDEVDVGDV 140 

Query: 122 FIEVYKRLEENGVFIFDVHSTYQTDKVFPGYSYHENADDFAMVWDTYEDDAPHSIVHELT 181 

F EVY L +G+FIFDVHSTYQTD+ FPGYSYHENADDFAMVWDTY D+APHS + VHELT 
Sbjct: 141 FKEVYDVLANDGI FI FDTOSTYQTDECFPGYSYHENADDFAMVWDTYADEAPHSVVHELT 200 

30 

Query: 182 FFVQEEDGRFTRHDEVHEERTYDILTYD1LLEQAGFKDVKVYADFEDKKPTATSARWFFV 241 

FF+QE+DGRF+R DEVHEERTY++LTYDILLEQAGFK KVYADFEDK+PT TS RWFFV 
Sbjct: 201 FFIQEDDGRFSRFDEVHEERTYELLTYDILLEQAGFKSFKVYADFEDKEPTKTSKRWFFV 260 

35 Query: 242 AHK 244 

A+K 

Sbjct: 261 AYK 263 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
40 vaccines or diagnostics. 

Example 1807 

A DNA sequence (GBSxl914) was identified in S.agalactiae <SEQ ID 5621> which encodes the amino 
acid sequence <SEQ ID 5622>. Analysis of this protein sequence reveals the following: 

Possible site: 54 
45 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3538 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06304 GB:AP001516 unknown conserved protein [Bacillus halodurans] 
Identities = 129/367 (35%) , Positives = 184/367 (49%) , Gaps = 45/367 (12%) 

Query: 1 MTVTGIVAEFNPFHNGHKYLLEOAQ GIKVIAMSGNFMQRGEPAIVDKWTRSQMAL 55 

M G+V E+NPFHNGH + L +A+ + + MSG F+QRGEPAI+ KW R+ +AL 

Sbjct: 1 MKAVGVVVEYNPFHNGHLHHLTEARKQAKftDWIAVMSGYFLQRGEPAILPKWERTSLAL 60 
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Query: 56 ENGADLVIELPFLVSVQSADYFASGAVSIIARLGVDNLCFGTEE--MLDYARIGDIYVNK 113 

+ GADLV+ELP+ S Q A++FA+GAVSILA L D LCFG+EE + + R+ 
Sbjct: 61 QGGADLVVELPYAFSTQKAEWFATGAVSIIAALEADALCFGSEEGTIEPFHRLYHFMAKH 120 

5 Query: 114 KEEMEAFLKKQSD-SLSYPQKMQAMWQEFAGIT--FSGQTPNHILGIAYTKAA--SQNGI 168 

+ + +K++ D +SYP ++ G PN+ILG Y KA I 

Sbjct: 121 RLAWDRMI KEELDKGMS YPTATSLAFKRLEGSAEHLDLSRPNNI LGFHYVKAI YDLHTS I 180 

Query: 169 RLNPIQRQGAGYHSSEKTE- IFASATSLRK HQSDRFF VEKGMPNSD 213 

10 + I R AGYH E ASATS+RK DR + K 

Sbjct: 181 KAMTI PRI KAGYHDDSLNESS IASATS IRKSLKTKEGWQMVDRWPSYTTEMLKSFEKET 240 

Query: 214 LFLNSPQWWQDYFSLLKYQIMTHS- -DLTQIYQVNEEIANRIKSQIRYVETVDELVDKV 271 
FL S W+ F LLKY+++T + L IY+ E+R I + + + K+ 
15 Sbjct: 241 TFLPS WERLFPLLKYRLLTATPEQLHAIYEGEEGLEYRALKTIVSATSFHDWMTKM 296 

Query: 272 ATKRYTKARIRRLLTYILINAVESPIPNA IHVLGFTQKGQQHLKSVKK-- 319 

TKRYT RI+R T++ N + I + I +LG T +GQ +L KK 

Sbjct: 297 KTKRYTWTRIQRYATHLFTlSrrTKEEIHSVLPRGTESLPYIRLLGMTSRGQMYIjNGKKKQL 356 



20 



Query: 320 SVDIVTR 326 

+ ++TR 
Sbjct: 357 TTPVITR 363 

25 A related DNA sequence was identified in S. pyogenes <SEQ ID 5623> which encodes the amino acid 
sequence <SEQ ID 5624>. Analysis of this protein sequence reveals the following: 
Possible site: 33 

>>> Seems to have no N-terminal signal sequence 

30 Final Results 

bacterial cytoplasm Certainty=0 .3165 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



35 An alignment of the GAS and GBS proteins is shown below. 

Identities = 221/359 (61%) , Positives = 288/359 (79%) 





Query: 


1 


MTVTGIVAEFNPFHNGHKYLLEQAQGIKVIAMSGNFMQRGEPAIVnKWTRSQMALENGAD 


60 








MTVTGI +AEFNPFHNGHKYLLE A+G+K+IAMSGNFMQRGEPA++DKW RS+MAL+NGAD 




40 


Sb j ct : 


1 


MTVTGIIAEFNPFHNGHKYLLETAEGLRIIAMSGNFMQRGEPALIDKWIRSEMALKNGAD 


60 




Query: 


61 


LVIELPFLVSVQSADYFASGAVSILARLGVDNLCFGTEEMLDYARIGDIYVNKKEEMEAF 


120 








+V+ELPF VSVQSADYFA GA+ IL +LG+ L FGTE ++DY ++ +Y K E+M A+ 




45 


Sb j ct : 


61 


IWELPFFVSVQSADYFAQGAIDILCQLGIQQIAFGTENVIDYQKLIKVYEKKSEQMTAY 


120 




Query: 


121 


LKKQSDSLSYPQKMQAMWQEFAGITFSGQTPNHILGLAYTKAASQNGIRLNPIQRQGAGY 


180 








L D+ SYPQK Q MW+ FAG+ FSGQTPNHILGL+Y KA++ I+L PI+RQGA Y 






Sb j ct : 


121 


LSTLEDTFSYPQKTQKMWEIFAGVKFSGQTPNHILGLSYAKASAGKHIQLCPIKRQGAAY 


180 


50 


Query: 


181 


HSSEKTEIFASATSLRKHQSDRFFVEKGMPNSDLFIiNSPQVWQDYFSLLKYQIMTHSDL 


240 








HS +K + ASA+++R+H +D F+ +PN+ L +N+P + W YFS LKYQI+ HSDL 






Sb j ct : 


181 


HSKDKNHLIASASAIRQHLNDWDFISHSVPNAGLLINNPHMSVTOHYFSFLKYQIIOTSDL 


240 




Query: 


241 


TQIYQVNEEIANRIKSQIRYVETvTJELVDKVATKRYTKftRIRRLLTYILINAVESPIPNA 


300 


55 






T I+QVN+E+A+RIK 1+ + +D LVD VATKRYTKAR+RR+LTYIL+NA E +P 






Sb j ct : 


241 


TSIFQVNDEIASRIKKAIKVSQNIDHLVDTVATKRYTKARVRRILTYILVNAKEPTLPKG 


300 




Query: 


301 


IHVLGFTQKGQQHLKSWKSVDIVTRIGSQTWDSLTQRADSVYQMGNANIAEQTWGRIP 359 








IH+LGFT KGQ HLK +KKS ++TRIG++TWD +TQ+ADS+YQ+G+ +1 EQ++GRIP 




60 


Sbjct: 


301 


IHILGFTSKGQAHLKKLKKSRPLITRIGAETWDEMTQKADSIYQLGHQDIPEQSFGRIP 359 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1808 

A DNA sequence (GBSxl915) was identified in S.agalactiae <SEQ ID 5625> which encodes the amino 
acid sequence <SEQ ID 5626>. This protein is predicted to be transcriptional activator tipa. Analysis of this 
protein sequence reveals the following: 

5 Possible site: 17 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3117 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15677 GB:Z99122 transcriptional regulator [Bacillus subtilis] 
15 Identities = 91/246 (36%) , Positives = 144/246 (57%) , Gaps = 14/246 (5%) 

Query: 4 VKEISHISGISVRTLHYYDEIDLLSPSFVGENGYRYYDDESLIKLQEILLFKELEFPLKK 63 

VK+++ ISG+S+RTLH+YD I+LL+PS + + GYR YD L +LQ+IL FKE+ F L + 
Sbjct: 5 VKQVAEISGVSIRTLHHYDNIELLNPSALTDAGYRLYSDADLERLQQILFFKEIGFRLDE 64 

20 

Query: 64 I KE IMDS PNYDRNQALLDQI RWLELKKQRLEEVIEHAK S I QRGKNMSD FTAYN 116 

IKE++D PN+DR AL Q L KKQR++E+I+ S+ G+ M+ F + 

Sbjct: 65 IKEMLDHPNFDRKAALQSQKEILMKKKQRMDEMIQTIDRTLLSVDGGETMNKRDLFAGLS 124 

25 Query: 117 QEELEAFQ EEARTRWGD- -TDSYKEFENSHSKNDFSMISQAMSQIFKDFGQLKELS 170 

+++E Q +E R +G + ++ +++S +D+ I I++ + 

Sbjct: 125 MKDIEEHQQJTYADEVRKLYGKEIAEETEKRTSAYSADDWRTIMAEFDSIYRRIAARMKHG 184 

Query: 171 PTDEKVQKQVQILQDYITAQFYNCTNDLLASLGIMYIQDERFQKSIDNWGGQGTALFVSK 230 
30 P D ++Q V +D+I Y+CT D+ LG +YI DERF SI+ + G+G A F+ + 

Sbjct: 185 PDDAEIQAAVGAFRDHICQYHYDCTLDIFRGLGEVYITDERFTDSINQY-GEGLAAFLRE 243 

Query: 231 AIDSYC 236 
AI YC 

35 Sbjct: 244 AIIIYC 249 

There is also homology to SEQ ID 1712. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

40 Example 1809 

A DNA sequence (GBSxl916) was identified in S.agalactiae <SEQ ID 5627> which encodes the amino 
acid sequence <SEQ ID 5628>. Analysis of this protein sequence reveals the following: 



45 



50 



Possible site: 39 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2590 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14597 GB.-Z99117 yrkC [Bacillus subtilis] 
Identities = 56/129 (43%) , Positives = 74/129 (56%) , Gaps = 7/129 (5%) 



55 Query: 2 KGFHGNIEKLTLGNTNFRQVLYTAEHCQLVLMTLPVGGEIGSEIHAENDQFFRFEAGHGK 61 

K F NI + T N FR L+T +H Q+ LM+L +G +IG EIH DQF R E G G 
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Sbjct: 59 KPFVVNIl^TKQNOTFRTALWTGKHFQVTLMSLGIGEDIGLEIHPNVDQFLRIEQGRGI 118 

Query: 62 WIDGN EYEVADGDAI I VPAGAEHNVINTSETEMLKLYTI YSPAHHKDGI IRAT 115 

V + + + V D AI+VPAG HNVINT T LKLY+IY+P +H G + T 

Sbjct: 119 VKMGKSKDHLNFQRNVYDDSAIVVPAGTWHIWINTGNTP-LKIiYSIYAPPIfflPFGTVHET 177 

Query: 116 REEAEENEE 124 

+ +A E+ 
Sbjct: 178 KADAVAAED 186 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1810 

15 A DNA sequence (GBSxl917) was identified in S.agalactiae <SEQ ID 5629> which encodes the amino 
acid sequence <SEQ ID 5630>. This protein is predicted to be glycerol uptake facilitator (glpF). Analysis of 
this protein sequence reveals the following: 



Possible site: 61 

»> Seems to have an uncleavable N-term signal seq 



INTEGRAL 


Likelihood = 


-9. 


.08 


Transmembrane 


156 - 


172 


( 153 - 


180) 


INTEGRAL 


Likelihood = 


-6. 


.21 


Transmembrane 


135 - 


151 


( 132 - 


155) 


INTEGRAL 


Likelihood = 


-4. 


.09 


Transmembrane 


86 - 


102 


( 80 - 


103) 


INTEGRAL 


Likelihood = 


-3 . 


.93 


Transmembrane 


213 - 


229 


( 212 - 


230) 


INTEGRAL 


Likelihood = 


-3. 


.72 


Transmembrane 


8 - 


24 


( 5 - 


28) 


INTEGRAL 


Likelihood = 


-2. 


.76 


Transmembrane 


38 - 


54 


( 36 - 


58) 



20 



25 

Final Results 

bacterial membrane Certainty=0 .4630 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

30 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04811 GB:AP001510 glycerol uptake facilitator [Bacillus halodurans] 
Identities = 135/230 (58%), Positives = 171/230 (73%) 

35 

Query: 1 MTQFLGEFLGTFILVLLGDGWAGNVLSKTKEEGTGWTAIVFGWGIACTVAVYVSGLFSP 60 

M+ FLGE +GT IL++LG GWAG VL TK E GW I WG+A AVY G S 
Sbjct: 1 MS PFLGEVIGTMI L 1 1 LGGGWAG WLKGTKSENGGWI VI TAAWGLA VATAVYCVGQI SG 60 

40 Query: 61 AHLNPA VTLAMAS IGAI SWGQVI PFI I AQMLGAMVAATI LWLHYYPHWKETKDSGLI LAS 120 

AHLNPAVT+ +A +GA W QV +I+AQMLGAM+ AT+++LHYYPH+K T+D G LA 
Sbjct: 61 AHIjNPAVTIGIALVGAFEWSQVAGYIVAQMLGAMIGATLVFLHYYPHFKATEDQGAKLAV 120 

Query: 121 FSTGPAIRHTPSNLLGEIIGTAILVITIMAIGPSKVAAGLGPIIVGIVIFAVGFSLDPTT 180 
45 FST PAI+H P+N E++GT +LV+ I+AIG ++ GL P+IVG++I +G SL TT 

Sbjct: 121 FSTDPAIKHLPANFFSEVLGTFVLVLGILAIGANEFTEGLNPLIVGLLIWIGLSLGGTT 180 

Query: 181 GYAINPARDLGPRLMHAILPIENKGNSDWSYAWIPWGPIIGGVLGAILY 230 
GYAINPARDLGPR+ H +LPI KG+S+WSYAWIP+VGPIIGG +GA+ Y 
50 Sbjct: 181 GYAINPARDLGPRIAHFLLPIPGKGSSNWSYAWIPIVGPIIGGGIGALTY 230 

There is also homology to SEQ ID 2854. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1811 

A DNA sequence (GBSxl918) was identified in S.agalactiae <SEQ ID 5631> which encodes the amino 
acid sequence <SEQ ID 5632>. Analysis of this protein sequence reveals the following: 

Possible site: 37 
5 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1694 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB07114 GB:AP001518 unknown conserved protein in others 
[Bacillus halodurans] 
15 Identities = 64/118 (54%) , Positives = 85/118 (71%) 

Query: 5 GIIWSHSKNIAQGWDLISEVAKDVSITYVGGTEDGEIGTSFDQVQQIVEQNDKKTLLA 64 

GI++ SH +A+G+V L+ E AKDVSITY GGT+D ++G SF+++QQ V N+ L 
Sbjct: 7 GIVISSHVPALAEGIVTLLKEAAKDVSITYAGGTDDDQVGASFEKIQQAVMDNEADELFV 66 

20 

Query: 65 FFDLGSAKMNLELVADFSEKNI I INS VP WEGAYTAAALLQAGADLDS IQSQLAELTI 122 
F+DLGSAKMN+E+V + SEK I + V +VEGAYTAAAL Q GA ++I QL LTI 
. Sbjct: 67 FYDLGSAKMNVEMVMELSEKTIHLMDVALVEGAYTAAALTQGGASFETIMEQLQPLTI 124 

25 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1812 

A DNA sequence (GBSxl919) was identified in S.agalactiae <SEQ ID 5633> which encodes the amino 
30 acid sequence <SEQ ID 5634>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

>>> Seems to have no N-terminal signal sequence 

Final Results 

35 1 bacterial cytoplasm Certainty=0. 4753 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

. 40 >GP:BAB07115 GB:AP001518 unknown conserved protein in others 

[Bacillus halodurans] 
Identities = 98/190 (51%) , Positives = 135/190 (70%) , Gaps = 2/190 (1%) 

Query: 3 VKTAIEWMHTFNQKIQSNKDYLSELDTPIGDGDHGGNMARGMTAVIENLDNNEFSSAADV 62 
45 V+ +W+H F++K+Q+N+ YLSELD+ IGDGDHG NMARG+ V L NFS +V 

Sbjct: 4 VENTTKWLHAFHEKVQANQSYLSELDSAIGDGDHGTNMARGLAEVERKLKENLFESPQEV 63 

Query: 63 FKTVSMQLLSKVGGASGPLYGSAFMGITK-AEQSKSTISEALGAGLEMIQKRGKAELNEK 121 
K +M L+SK GGASGPLYG+A + ++K I +++ AGL I KRGKA EK 

50 Sbjct: 64 LKM?iAMALISKTGGASGPLYGTALLEMSKQVANDPQNIGKSIEAGLNGILKRGKATTGEK 123 

Query: 122 TMVDVWHGVIEAI-EKNELTEDRIDSLVDATKGMKATKGRASYVGERSVGHIDPGSFSSG 180 

TMVD+W V+E++ + +L+++RI V TK MKATKGRASY+GERS+GH+DPG+ SSG 
Sbjct: 124 TMVDIWKPWESLMAEQQLSKERIQQFVSETKEMKATKGRASYLGERSLGHLDPGAVSSG 183 



55 



Query: 181 LLFKALLEVG 190 
LF+A+++ G 



WO 02/34771 



PCT/GB01/04789 



-2047- 

Sbjct: 184 YLFEAMIDGG 193 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

5 Example 1813 

A DNA sequence (GBSxl920) was identified in S.agalactiae <SEQ ID 5637> which encodes the amino 
acid sequence <SEQ ID 5638>. This protein is predicted to be dihydroxyacetone kinase (M200). Analysis 
of this protein sequence reveals the following: 

Possible site: 59 
10 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2080 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



20 



>GP:BAB07116 GB:AP001518 dihydroxyacetone kinase [Bacillus halodurans] 
Identities = 204/329 (62%) , Positives = 261/329 (79%) 

Query: 1 MKKI LNQPTDWTEMLDGLAYVHNDLVHRIEGFD 1 1 ARNEEKSGKVAL I SGGGSGHEPSH 60 

MKKILN P +V+ EMLDG Y + LV R+ G +1 R E GKVAL+ SGGGSGHEPSH 
Sbjct: 1 MKKILNDPQNVLDEMLDGFVYANGHLVERVAGTGVIRRTYEDKGKVALVSGGGSGHEPSH 60 

25 Query: 61 AGFVGEGMLSAAVCGAVFTSPTPDQTOEAIKEADEGAGVTOVTKNYSGDIMNFEMAQDMA 120 

AGFVG+GMLSAAVCG VFTSPTPDQ+ E IK AD+G GV ++IKNY+GD+MNFEMA +MA 
Sbjct: 61 AGFVGQGMLSAAVCGEVFTSPTPDQIFEGIKAADQGGGVLLIIKNYTGDVMNFEMAGEMA 120 

Query: 121 EMEGIEVASVVVDDDIAVEDSLYTCjGKRGVAGTILVHKILGHAARHGKSLQEIKAIADEL 180 
30 E EGI V ++V+DDIAVEDS +T G+RGVAGTI+VHKI+G AA G SLQ +K + + + 

Sbjct: 121 FJffiGITVDHIIVNDDIAVEDSSFTAGRRGVAGTIIVHKIVGAAAEAGLSLQSLKVLGETV 180 

Query: 181 VPNIHTVGLALSGAWPEVGKPGFVLAEDEIEFGIGIHGEPGYRKEKMQPSKALATELVD 240 
+ N T+G+++ ATVP VGKPGF h +DE+E+G+GIHGEPGYRKEK++ SK +A EL+ 
35 Sbjct: 181 IENTKTIGVSILPATVPAVGKPGFELGDDEMEYGVGIHGEPGYRKEKLKSSKEIAEELIL 240 

Query: 241 KLIESFDAKSGEKYGVLINGMGATPLMEQYVFANDVAKLLEDKGIEVNYKKLGNYMTSID 300 

KL E+F G+KYGVL+NG+GATPLMEQYVF NDVA L ++G+ + +KK+G++MTSID 
Sbjct: 241 KLKEAFGWSKGDKYGVLWGLGATPLMEQYVFMNDVANKLTEEGLNIQFKKVGSFMTSID 300 



40 



Query: 301 MAGLSLTLI KLENQEWLEALNSDVTT I AW 329 

MAG+SLTLIK+ ++WL+ N +V T+ W 
Sbjct: 301 MAGVSLTLIKIVEEKWLDYWNHEVKTVDW 329 



45 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be, useful antigens for 
vaccines or diagnostics. 

Example 1814 

A DNA sequence (GBSxl921) was identified in S.agalactiae <SEQ ID 5639> which encodes the amino 
50 acid sequence <SEQ ID 5640>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

>>> Seems to have no N-terminal signal sequence 



55 



Final Results 

bacterial cytoplasm Certainty=0. 1997 (Affirmative) < suco 
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bacterial membrane Certainty=0. 0000 (Not Clear) < suoc> 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

5 >GP:BAB07113 GB:AP001518 unknown [Bacillus halodurans] 

Identities = 59/142 (41%) , Positives = 82/142 (57%) , Gaps = 5/142 (3%) 

Query: 1 MTSSLITKKKIAKSFKRLFISQAFDKISVSDIMEDAGIRRQTFyNHFVDKYALIiEWIFQT 60 
MT+S+ITKK IAK+FK L Q F KISVSDIM A +RRQTFY HF DK+ LL WI++ 
10 Sbjct: 1 MTNSIITKro/'IAKAFKDLMEVQPFSKISVSDIMNRANMRRQTFYYHFQDKFELIiHWIYKQ 60 

Query: 61 ELSEQVTDNLDYISGFQLLSELLTFFKMNQEFYIKLFQIEDQNDFSSYFESYCEQLVDKL 120 

EE D h Y + L+ +F NQ FY + + ON F+ Y + + L 
Sbjct: 61 ETKEHSIDFLAYDDIHTIFRHLMHYFYENQTFYQRAMWNGQNGFTDYLYEH1QTL Y 117 



15 



Query: 121 LSDYSKSNFNQKERVTFINYHS 142 

L++ + +QK+R +++S 
Sbjct: 118 LNEIDRR--SQKDREFISSFYS 137 



20 A related DNA sequence was identified in S.pyogenes <SEQ ID 5641> which encodes the amino acid 
sequence <SEQ ID 5642>. Analysis of this protein sequence reveals the following: 
Possible site: 31 

>>> Seems to have no N- terminal signal sequence 

25 Final Results 

bacterial cytoplasm Certainty=0 . 2101 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

30 An alignment of the GAS and GBS proteins is shown below. 

Identities = 31/115 (26%) , Positives = 58/115 (49%) , Gaps = 6/115 (5%) 

Query: 7 TKKKIAKSFKRLFISQAFDKISVSDIMEDAGIRRQTFYNHFVDKYALLEWIFQTELSEQV 66 
TK + + L Q+F+ ++VSD+ + AGI R TFY H+ DK+ ++ F+ + + + 
35 Sbjct: 8 TKAYVKTALTTLLTEQSFETLTVSDLTKKAGINRGTFYLHYTDKFDMMNH-FKNDTLDDL 66 

Query: 67 TDNLD YISGFQLLSELLTFFKMNQEFYIKLFQIEDQNDFSSYFESYCEQLV 117 

L+ Y Q+L++ L++ ++EF LI F + +C Q + 

Sbjct: 67 YRLLNQAEIYTDTRQVLNQTLSYLIEHREFITALATI-SYLKFPQLIKDFCYQFL 120 

40 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1815 

A DNA sequence (GBSxl922) was identified in S.agalactiae <SEQ ID 5643> which encodes the amino 
45 acid sequence <SEQ ID 5644>. Analysis of this protein sequence reveals the following: 

Possible site: 45 

?» Seems to have no N-terminal signal sequence 

Final Results 

50 bacterial cytoplasm Certainty=0. 1974 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
55 No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1816 

A DNA sequence (GBSxl923) was identified in S.agalactiae <SEQ ID 5645> which encodes the amino 
5 acid sequence <SEQ ID 5646>. This protein is predicted to be dihydroxyacetone kinase (bl200). Analysis 
of this protein sequence reveals the following: 

Possible site: 55 

>>> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0. 1806 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB07112 GB:AP001518 dihydroxyacetone kinase [Bacillus halodurans] 
Identities = 141/285 (49%) , Positives = 197/285 (68%) , Gaps = 1/285 (0%) 

Query: 45 IPILSGGGSGHEPAHFGYVGEGMLSAAISGPIFVPPCASDILETIRFINRGKGVFVIIKN 104 
20 +PI+SGGGSGHEP H GYVGEGML+AA+ G +FVPP A +L IR +++GKGV +IIKN 

Sbjct: 46 VPIISGGGSGHEPGHLGWGEGMLAAAVHGDVFVPPSAQQVLAAIRQMDQGKGVLLIIKN 105 

Query: 105 FEADLEEFSQAIEQARQEGIPIKYIVSHDDISVET-SNFKIRHRGVAGTVLLHKIIGQAA 163 
F ADL F A QAR EG + +++ +DD+SVE+ ++F+ R RGVAG VL+HKIIG AA 
25 Sbjct: 106 FVADIATFLSAEVQARAEGRDVAHVIVNDDVSVESDASFEKRRRGVAGAVLVHKIIGAAA 165 

Query: 164 LEGASLDELEQLGLSLTTSMATIiGVASKSATILGQHQPVFDIEEGYISFGIGIHGEPGYR 223 

EG SL+ B+++G + ++ATLGVA A + + +P F +EEG + FG+GIHGE GYR 
Sbjct: 166 KEGYSLEALQEIGEQVVKNLATLGVALTHADLPERREPQFLLEEGEVYFGVGIHGEQGYR 225 

30 

Query: 224 TMPFVSMEHLANELWKLKMKLRWQDGEAFILLINNLGGSSKMEELLFTNAVMEFLALDD 283 

VS E LA ELVNKLK RW + + +LIN LGG+ +E+ +F N V LA+++ 
Sbjct: 226 KEKLVSSELLAVELVNKLKSLYRWDKNDQYAVLINGLGGTPLIEQYVFANDVRRLLAIEN 285 

35 . Query: 284 LQLPFIKTGHLITSLDMAGLSVTLCRVKDSRWIDYLKHKTDARAW 328 

L + F+K G +TSL+M G+S+T+ ++ D +W+ +L D W 
Sbjct: 286 LHVSFVKVGTQLTSLNMKGISLTMLKICDEQWVKWLYAPVDVAHW 330 

No corresponding DNA sequence was identified in S.pyogenes. 

40 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1817 

A DNA sequence (GBSxl924) was identified in S.agalactiae <SEQ ID 5647> which encodes the amino 
acid sequence <SEQ ID 5648>. Analysis of this protein sequence reveals the following: 

45 Possible site: 53 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3902 (Affirmative) < suco 

50, bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 10085> which encodes amino acid sequence <SEQ ID 
10086> was also identified. 



WO 02/34771 



PCT/GB01/04789 



-2050- 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC75047 GB:AE000290 orf, hypothetical protein [Escherichia coli K12] 
Identities = 182/237 (76%) , Positives = 201/237 (84%) 

5 Query: 20 MGRKWANIVAKKTAKDGftNSKVYAKFGVEIYVAAKQGEPDPESNSALKPVLDRAKQAQVP 79 

MGRKWANIVAKKTAKDGA S K+ YAKFGVE I Y AAKQGEPDPE N+ +LKFV+ +RAKQAQ VP 
Sbjct: 1 MGRKWANIVAKKTAKDGATSKIYAKFGVEIYAAAKQGEPDPELNTSLKFVIERAKQAQVP 60 

Query: 80 KHVIDKAIDKAKGlsrrDETEVEGRYEGFGPNGSMIIvTDTLTSNvmTAMWRTAYGKNGGN 139 
10 KHVIDKAIDKAKG DETFV+GRYEGFGPNGSMII +TLTSNVNRT ANVRT + K GGN 

Sbjct: 61 KHVIDKAIDKAKGGGDETFVQ^RYEGFGPNGSMIIAETLTSNVmTIAWVRTIFNKKGGN 120 

Query: 140 MGASGSVSYLFDKKGVIVFAGDDADWFEQLLEADVDVDDVEAEEGTITVYTAPTDLHKG 199 
+GA+GSVSY+FD GVIVF G D D +FE LLEA+VDV DV EEG I +YT PTDLHKG 
15 Sbjct: 121 IGAAGSVSYMFDNTGVIVFKGTDPDHIFEILLEAEVDVRDVTEEEGNIVIYTEPTDLHKG 180 

Query: 200 IQALRDNGVEEFQVTELEMIPQSEVVLEGDDLETFEKLIDALESDDDVQKVYHNVAD 256 

I AL+ G+ EF TELEMI QSEV L +DLE FE L+DALE DDDVQKVYHNVA+ 
Sbjct: 181 IAALKAAGITEFSTTELEMIAQSETOLSPEDLEIFEGLTOALEDDDDVQKVYHNVAN 237 



20 



25 



30 



A related DNA sequence was identified in S.pyogenes <SEQ ID 5649> which encodes the amino acid 
sequence <SEQ ID 5650>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 2926 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 233/238 (97%), Positives = 236/238 (98%) 

Query: 20 MGRKWANIVAKKTAKDGANSKVYAKFGVEIYVAAKQGEPDPESNSALKFVLDRAKQAQVP 79 
35 MGRKWANIVAKKTAKDGA SKVYAKFGVEIYVAAKQGEPDPE N+ALKFV+DRAKQAQVP 

Sbjct: 1 MGRKWANIVAKKTAKDGATSKWAKFGVEIYVAAKCGEPDPELNTALKFVIDRAKQAQVP 60 

Query: 80 KHVIDKAIDKAKGNTDETFVEGRYEGFGPNGSMIIVDTLTSNVNRTAANVRTAYGKNGGN 139 
KHVIDKAIDKAKGNTDETFVEGRYEGFGPNGSMIIVDTLTSNVNRTAANVRTAYGKNGGN 
40 Sbjct: 61 KHVIDKAIDKAKGNTDETFVEGRYEGFGPNGSMIIVDTLTSNVNRTAANVRTAYGKNGGN 120 

Query: 140 MGASGSVSYLFDKKGVIVFAGDDADTVFEQLLEADVDVDDVEAEEGTITVYTAPTDLHKG 199 

MGASGSVSYLFDKKGVIVFAGDDAD+VFEQLLEADVDVDDVEAEEGTITVYTAPTDLHKG 
Sbjct: 121 MGASGSVSYLFDKKGVIVFAGDDADSVFEQLLEADVDVDDVEAEEGTITVYTAPTDLHKG 180 

45 

Query: 200 IQALRDNGVEEFQVTELEMIPQSEWLEGDDLETFEKLIDALESDDDVQKVYHNVADF 257 

IQALRDNGWEFQWELEMIPQSEVVLEGDDLETFEKLIDALESDDDVQKVYHNVADF 
Sbjct: 181 IQALRDNGVEEFQVTELEMIPQSEVVLEGDDLETFEKLIDALESDDDVQKVYHNVADF 238 

50 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1818 

A DNA sequence (GBSxl925) was identified in S.agalactiae <SEQ ID 5651> which encodes the amino 
acid sequence <SEQ ID 5652>. Analysis of this protein sequence reveals the following: 

55 Possible site: 17 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2507 (Affirmative) < suco 
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bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

5 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1819 

A DNA sequence (GBSxl926) was identified in S.agalactiae <SEQ ID 5653> which encodes the amino 
10 acid sequence <SEQ ID 5654>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

»> Seems to have no N-terminal signal sequence 

Final Results 

15 bacterial cytoplasm Certainty=0 . 1523 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

20 >GP:CAA20826 GB:AL031541 hypothetical protein SCI35.37 [Streptomyces 

coelicolor A3 (2) ] 

Identities = 73/178 (41%) , Positives = 101/178 (56%) , Gaps = 2/178 (1%) 

Query: 35 VKNAGGLPVILPISEAESAKAyVEMIDKLIISGGQNVLPSYYGEEKIIESDDYSLARDIF 94 
25 V+ AGGL +LP E A A V +D ++I+GG +V P YG E + + ARD + 

Sbjct: 37 VQRAGGLAAMLPPDAPEHAAATVARVDGW 96 

Query: 95 EFALVEEALKQNKPIFAICRGMQLVIWAI^TIJTQSIDNHYQEPYIGFAHYLNVEKGSFL 154 
E AL+E AL P+ ICRGMQL+NVALGGTL Q 1+ H + + H + G+ 
30 Sbjct: 97 EIiALIEAALAARVPLLGICRGMQLLNVAIiGGTLVQHIERHAEVVGVFGGHPVRPVPGTLY 156 



35 



40 



45 



Query: 155 EGFISGDFKINSLHRQSVKLLAEGLIVSARDPRDGTVEAYESRT-EQCIIGVQWHPEL 211 

G + + + + H Q+V L GL+ SA DGTVEA E + ++GVQWHPE+ 
Sbjct: 157 AGAVPEETFVPTYHHQAVDRLGSGLVASAH-AADGTVEALEMPSGSGWVLGVQWHPEM 213 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5655> which encodes the amino acid 
sequence <SEQ ID 5656>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 1210 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 111/230 (48%) , Positives = 145/230 (62%) , Gaps = 3/230 (1%) 

Query: 2 LTKPIIGITGNEREMSDIPGYYYDSVSRHISEGVKNAGGLPVILPISEAESAKAYVEMID 61 
50 +TKPIIGIT N+R + + + V +GGLP++LPI + +AK YV M+D 

Sbjct: 1 MTKPIIGITANQRIiNMALDNLPWSYAPTGFVQAVTQSGGLPLLLPIGDEAAAKTWSMVD 60 



55 



Query: 62 
Sbjct: 61 



KLI I SGGQNVLPS YYGEEKI IESDDYSLARDIFEFALVEEALKQNKP I FAI CRGMQL VNV 121 
K+I+ GGQNV P YY EEK DD+S RD FE A+++EA+ KPI ICRG QL+NV 
KIILIGGQNVDPKYYQEEKAAFDDDFSPERDTFELAIIKEAITLKKPILGICRGTQLMNV 120 
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Query: 122 ALGGTLNQSIDNHYQE-PYIGFAHYI.NVEKGSFLEGFISGDFKINSLHRQSVKLLAEGLI 180 

ALGG LNQ ID+H+QE P +H + +E S L INS HRQS+K +A+ L 

Sbjct: 121 ALGGNLNQHIDSHWQEAPSDFLSHEMIIEPDSILYPIYGHKTLINSFHRQSLKWAKDLK 180 

5 Query: 181 VSARDPRDGTVEAYESRTEQC-IIGVQWHPELMLH-QIENQTLFGYFVNE 228 

V ARDPRDGT+EA S + +GVQWHPEL+ + E+ LF FVN+ 
Sbjct: 181 VIARDPRDGTIFAVISTNDAIPFLGVQWHPELLQGVRDEDLQLFRLFVND 230 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
10 vaccines or diagnostics. 

Example 1820 

A DNA sequence (GBSxl927) was identified in S.agalactiae <SEQ ID 5657> which encodes the amino 
acid sequence <SEQ ID 5658>. Analysis of this protein sequence reveals the following: 

Possible site: 31 
15 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 5794 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
25 vaccines or diagnostics. 

Example 1821 

A DNA sequence (GBSxl928) was identified in S.agalactiae <SEQ ID 5659> which encodes the amino 
acid sequence <SEQ ID 5660>. Analysis of this protein sequence reveals the following: 

Possible site: 15 
30 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0524 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8905> which encodes amino acid sequence <SEQ ID 8906> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: 22 Crend: 4 
40 McG: Discrim Score: 8.37 

GvH: Signal Score (-7.5): -0.64 

Possible site: 21 
>>> May be a lipoprotein 

ALOM program count: 0 value: 6.74 threshold: 0.0 
45 PERIPHERAL Likelihood = 6.74 112 

modified ALOM score: -1.85 

*** Reasoning Step: 3 

50 Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 2919> which encodes the amino acid 
sequence <SEQ ID 2920>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

>» May be a lipoprotein 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 120/162 (74%) , Positives = 141/162 (86%) , Gaps = 5/162 (3%) 

Query: 6 LAACSSKSHTTKTGK KEVNFATVGTTAPFSYVKDGKLTGFDIEVAKAVFKGSDNYK 61 

15 LAAC S S T ++G KEV FATVGTTAPFSY K G+LTG+DIEVAKAVFKGSD+YK 

Sbjct: 20 LAACGS - SKTAESGNQGSSKEVLFATVGTTAPFSYEKGGQLTGYDIEVAKAVFKGSDDYK 78 

Query: 62 VTFKKTEWSSVFTGIDSGKFQMGGNNISYSSERSQKYLFSYPIGSTPSVLAVPKNSNIKA 121 
V+ FKKTEWS S + FTG+DSGK+QMGGNNI S+ + ERS KYLFSYPIGSTPSVL VPK+S+IK+ 
20 Sbjct: 79 V3FKKTEWSSIFTGLDSGKYQMGGNNISFTKERSAKYLFSYPIGSTPSVLWPKDSDIKS 138 

Query: 122 YNDISGHKTQWQGTTTAKQLENFNKEHQKNPVTLKYTNENL 163 

++DI GH TQWQGTT+ QLE+FNK+H NPVTLK+TNEN+ 
Sbjct: 139 FDDIQGHTTQWQGTTSVAQLEDFNKKHSDNPVTLKFTNENI 180 

25 

SEQ ID 8906 (GBS71) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 17 (lane 4; MW 31.8kDa). 

GBS71-His was purified as shown in Figure 196, lane 7. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
30 vaccines or diagnostics. 

Example 1822 

A DNA sequence (GBSxl929) was identified in S.agalactiae <SEQ ID 5661 > which encodes the amino 
acid sequence <SEQ ID 5662>. Analysis of this protein sequence reveals the following: 

Possible site: 18 
35 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2179 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
There is also homology to SEQ ID 2920: 



Identities = 64/91 (70%) , Positives = 78/91 (85%) 

Query: 1 MSDGKADFKLFDGPTTOAIIKNQGLTNCjKTIPLTMRDQPYIYFIFGQDQKDLQKYVNNRL 60 

+S+GKADFK+FD PTVNAI I KNQGL NLKH LT +QP+IYFIF QDQ+ LQ +VN R+ 
Sbjct: 187 LSEGKADFKIFDAPTVNAIIKNQGLDNLKTIELTSTEQPFIYFIFSQDQEKLQSFVNKRI 246 

50 Query: 61 KQLRKDGTLSKIAKEYLGGDYVPNEKDLVTP 91 

K+L DGTLSK+AKE+LGGDYVP++K+L P 
Sbjct: 247 KELTADGTLSKLAKEHLGGDYVPSDKELKLP 277 
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Based on this analysis, it was predicted that this protein and its epitopes, could he useful antigens for 
vaccines or diagnostics. 

Example 1823 

A DNA sequence (GBSxl930) was identified in S.agalactiae <SEQ ID 5663> which encodes the amino 
5 acid sequence <SEQ ID 5664>. This protein is predicted to be 28 kDa outer membrane protein (yaeC). 
Analysis of this protein sequence reveals the following: 



10 



15 



30 



35 



Possible site: 41 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -1.44 Transmembrane 25 - 41 { 25 - 42) 

Final Results 

bacterial membrane Certainty=0 . 1574 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB59825 GB:AJ012388 hypothetical protein [Lactococcus lactis] 
Identities = 110/283 (38%) , Positives = 175/283 (60%) , Gaps = 13/283 (4%) 

20 Query: 22 KLKHIVLGLALTTLLGV TFSNQEVSASSTSSKVVKVGVMTFSDTEKARWDKIEKLV 77 

K ++I++ +A+ L+ + + ++Q +S K VKVG+M+ ++ W + 
KNRNIIIAVAVLILVALVAFFSLNHQGGVKASAGEKTVKVGIMSGDKQDQEWKSVANTA 63 

GDK- -AKIKFTEFTDYTQPNCKTANKDVDIN^^ 135 
25, +K K+KF F+DY QPN+A + D+DINAFQ YN+++ WNK +K +++ + TY+ P 



Query: 


22 


Sb j ct : 


4 


Query: 


78 


Sb j ct : 


64 


Query: 


136 


Sbjct: 


124 


Query: 


195 


Sb j ct : 


184 


Query: 


255 


Sb j ct : 


244 



+ IYS+++ L LK+G+T+AI PNDA+N SRAL+VLQSAGL+KL S K+ + +IT 



+ +E+DASQTPRAL V +++N Y A+L S+++F+E +K S Q+IN IA 



K+KN K + + AY + +K IK+ D +P W 
- - -TTSKEKNNKVYKEVAKAYASKATEKAIKEQYPDGGELPAW 283 



40 There is also homology to SEQ ID 2 1 32. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

A related GBS gene <SEQ ID 8907> and protein <SEQ ID 8908> were also identified. Analysis of this 
protein sequence reveals the following: 

45 Lipop: Possible site: -1 Crend: 4 

McG: Discrim Score: 7.47 
GvH: Signal Score (-7.5): -4.79 

Possible site: 21 
»> Seems to have an uncleavable N-term signal seq 
50 ALOM program count: 1 value: -1.44 threshold: 0.0 

INTEGRAL Likelihood = -1.44 Transmembrane 5 - 21 ( 5 - 22) 
PERIPHERAL Likelihood = 5.20 147 
modified ALOM score: 0.79 

55 *** Reasoning Step: 3 



Final Results 
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bacterial membrane Certainty=0. 1574 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

5 The protein has homology with the following sequences in the databases: 

40.6/63.1% over 279aa 

Lactococcus lactis 

GP | 6165402 | hypothetical protein Insert characterized 

10 ORF00442 (364 - 1182 of 1482) 

GP|6165402|emb|CAB59825.l| |AJ012388(4 - 283 of 287) hypothetical protein {Lactococcus 

lactis} 

%Match =21.0 

%Identity =40.6 %Similarity = 63.0 
15 Matches = 112 Mismatches = 96 Conservative Sub.s = 62 

162 ' 192 222 252 282 312 342 372 

WDTFKNS*RIPWR*LRTK*ERSRYS*GEWIKTKEMSILSFLIiYSLKL*QETvYNNLILITSYGIISLSQKLREFIMKLK 

■ ' I : 

20 MNPKNR 

402 450 480 510 540 564 594 

HIVLGLALTTLLG- -VTFS- -NQEVSASSTSSKWKVGVMTFSDTEKARWDKIEKLVGDK- -AKIKFTEFTDYTQPNQAT 

■ :h: = 1 = 1= II =1 =1 I I 111 = 1= == I = =1 1 = 11 1 = 11 I I 1 = 1 

25 NIIIAVAVIjILVALVAFFSIJfflQGGvTCASAGEKTVKVGIMSGDKQD 

20 30 40 50 60 70 80 

,624 654 684 714 744 774 804 834 

ANKDVDINAFQHYNFLENWNKENKKNLIPLEKTYIA^ 

30 : | : | 1 I | 1 I | I : : : I I I : I : : : = I I = I = I I I = = = I I I = I = I = I I I I I I = I I I I I = I I I I I I I = I 

LSGDIDINAFQSYNYVTCTWNKAHKSDIVAVGNTYITPMHIYSKEISKLS 

100 110 120 130 140 150 160 

861 891 ' 921 951 981 1011 1041 1071 

35 IiWS-GKKVATVANITSNKKDINIQELDASQTPRALKDVDAAIINNTYIEQANLKPSDAIFVEKSDKNSKQ 

II 1= = =11 I = =1=111111111 I ===l I 1=1 l===l=l =1 I 1=11 II 
LTTSDSSKLVGLPDITENPHQLKFKEVDASQTPRALDSVALSVVNYNYATAASLPKSESVFMEPLNKTSAQYINFIA 

180 190 200 210 220 230 240 

40 1101 1131 1161 1182 1212 1242 1272 1302 

NWKKQKNAKAIQAI LDAYHTDEVKKVI KDTSAD IPQW*RELTV*V*QGILIGYNLSAI*P*RAWDEYNVPGSWIVFE 

hll I = = II : =1 11= I =11 =1 
TTSKEKNNKVYKEVAKAYASKATEKAI KEQYPDGGELPAWDLKL 
260 270 280 

45 SEQ ID 8908 (GBS35) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 1 1 (lane 2; MW 3 1 .6kDa). 

The GBS35-His fusion product was purified (Figure 96A; see also Figure 192, lane 6) and used to immunise 
mice (lane 2 product; 20ug/mouse). The resulting antiserum was used for Western blot (Figure 96B), FACS 
(Figure 96C ), and in the in vivo passive protection assay (Table III). These tests confirm that the protein is 
50 immunoaccessible on GBS bacteria and that it is an effective protective immunogen. 

Example 1824 

A DNA sequence (GBSxl931) was identified in S.agalactiae <SEQ ID 5665> which encodes the amino 
acid sequence <SEQ ID 5666>. Analysis of this protein sequence reveals the following: 

Possible site: 37 
55 »> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3126 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF11560 GB:AE002038 ArgE / DapE / Acyl family protein [Deinococcus radiodurans] 
5 Identities = 129/419 (30%) , Positives = 210/419 (49%) , Gaps = 14/419 (3%) 

Query: 26 LRDLIAIKSIFAQKVGUTOLSSYLGEVFIKAGftEVIIDDSYSAPFIVANFKSSKVDAKRI 85 

LR L+A+ S+ AQ L + + + + G V AP ++A + 

Sbjct: 16 LRALVALPSVSAQGRMLPETADAVAGLLRAEGFGVQQFPGTVAPVLLAEAGEGPFT---L 72 

10 

Query: 86 IFYlTOYDTVPADEVEQWTEDPFTLSLRyGKMYGRGVDDDKGHITARLSAVKKYLSRHKGE 145 

+ YNHYD P D +E W PF L+ R G++YGRG DDKG + +RL+AV+ + G 
Sbjct: 73 LIYNHYDVQPEDPLELWDTPPFELTERGGRLYGRGASDDKGELASRLAAVRA-VREQLGH 131 

15 Query: 146 LPLDITFIVEGAEESASVGLDYYLEKYQEQLQGADLIVWEDGPKNPKGQLEIAGGNKGIV 205 

LP+ I +++EG EE S L+ ++ ++ +LQ AD WE G +P+G+ ++ G KG++ 
Sbjct: 132 LPVKIKWLIEGEEEVGSPTLERFVAEHAAELQ-ADGCWWEFGGISPEGRPILSLGLKGVM 190 

Query: 206 TFDLSVSSADVDIHSSFGGWDSSTWYLIQALNTLRDNKGHILVEGIYDKVIPPTKRELE 265 
20 +L AD D+HSS G V+D+ + L +A+ +LRD +G++ + G YD V + + + 

Sbjct: 191 CLELRCRVADSDLHSSLGAVIDNPLYCLARAVASLRDEQGNVTIPGFYDDVRAASGADRQ 250 

Query: 266 LVEKYSYRSAKALEGAYQLVLPSLADSHKTFLRKLYFEPSIAIEGITSGYQGEGVKTILP 325 
+ + +A+ ++P + + + P + + G GYQGEG KT+LP 

25 Sbjct: 251 AIAQIP-GDGQAWDTFGVRRP--LATGPAYNERTI^HPVVNVNGWGGGYQGEGSKTVLP 307 

Query: 326 AYAKCKAEVRLVPGLTPKGVLDSIQNHLKENGFKDIELT-YTLGEMSYRSDMSAPSILKV 384 

K + RLVP P VL ++ HL G DIE+ + R+D P + 

Sbjct: 308 GAGFVKLDFRLVPDQDPARVLSLLREHLTAQGLSDIEWELEAHQKPARADAGHPFVQAC 367 

30 , 

Query: 385 VDDAEQFYPEGI SLLPTSPGTGPMY LVHQALRAPIAAIGIGHANSRDHGVDENV 438 

V A + + + P+S +GPM+ L . P A+GIG+ R H +EN+ 

Sbjct: 368 VAAARAAHGQDPIVHPSSGASGPMFPFTGGAGGGGLGIPCVAVGIGNHAGRVHAPNENI 426 

35 There is also homology to SEQ ID 2588. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1825 

A DNA sequence (GBSxl932) was identified in S.agalactiae <SEQ ID 5667> which encodes the amino 
40 acid sequence <SEQ ID 5668>. This protein is predicted to be amino acid ABC transporter, ATP-binding 
protein. Analysis of this protein sequence reveals the following: 

Possible site: 47 

>» Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0. 5366 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB59828 GB:AJ012388 hypothetical protein [Lactococcus lactis] 
Identities = 187/338 (55%) , Positives = 256/338 (75%) , Gaps = 12/338 (3%) 

Query: 6 IIKLDNIDVTFHQKKREINAVKDVTIHINQGDIYGIVGYSGAGKSTLVRVINLLQEPSAG 65 
55 II+L+N+ V FHQK R + AVK+ T+HI +GDIYG++GYSGAGKSTLVR INLLQ+P+ G 

Sbjct: 4 IIEIiNNLSVQFHQKGRLVTAVKNATLHIEKGDIYGVIGYSGAGKSTLVRTINLLQKPTEG 63 



Query: 66 KITIDDQVIYD--NKVTLTSTQLREQRREIGMIFQHFNLMSQLTAEQNVAFALKHSG 120 

+1 1+ + I+D N V T +LRE R++IGMIFQHFNL+S+ T NVAFAL+HS 
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Sbjct: 64 QIVINGEKIFDSENPVKFTGAKLREFRQKIGMIFQHENLLSEKTVFNNVAFALQHSQIED 123 

Query: 121 LSKEAKAAKVAKLLELVGLSDRAQNYPSQLSGGQKQRVAIARALANDPKILIS 173 

L+K+ K KV +LL+LV L+D + YP+QLSGGQKQRVAIARALANDP+ILIS 
Sbjct: 124 KNGKKRYLTKKEKNDKVTELLKLVDLADLSDKYPAQLSGGQK^ 183 

Query: 174 DESTSALDPKTTKQIIALLQDmKKMLTIVLITHEMQIVKDIANRVAVMQNGKLIEEGS 233 

DE TSALDPKTT QIL LL+ L++KLG+T+VL1THEMQ+VK+IAN+VAVMQNG++IE+ S 
Sbjct: 184 DEGTSALDPKTOTQILDLLKSLHEKMITVVLITHEMQWKEIANKVAVMQWGEIIEQNS 243 

Query: 234 VLDIFSHPRESLTQDFIKIATGIDEAMLKIEQQEWKNLPVGSKLVQLKYAGHSTDEPLL 293 

++DIF+ P+E+LT+ FI+ + ++ + + + E++ L +L+ L Y+G ++P++ 
Sbjct: 244 LIDIFAQPKEALTKQFIETTSSVNRFIASLSKTELLAQLADDEELIHLDYSGSELEDPW 303 

15 Query: 294 NQIYKEFEVTANILYGNIE ILDGI PVGEMWILSGDEE 331 

+ I K+F+VT NI YGN+E+L G P G +V+ L G E 
Sbjct: 304 SDITKKFDVTTNI FYGNVELLQGQPFGSLVLTLKGSSE 341 

There is also homology to SEQ ID 76. 

20 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1826 

A DNA sequence (GBSxl933) was identified in S.agalactiae <SEQ ID 5669> which encodes the amino 
acid sequence <SEQ ID 5670>. This protein is predicted to be ABC transporter, permease protein. Analysis 
25 , of this protein sequence reveals the following: 
Possible site: 55 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-12.79 Transmembrane 203 - 219 ( 197 - 225) 
INTEGRAL Likelihood = -8.86 Transmembrane 73 - 89 ( 69 - 102) 
30 INTEGRAL Likelihood = -7.38 Transmembrane 38 - 54 ( 35 - 56) 

INTEGRAL Likelihood = -1.12 Transmembrane 103 - 119 ( 103 - 119) 

Final Results 

bacterial membrane Certainty=0 . 6116 (Affirmative) < suco 

35 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10083> which encodes amino acid sequence <SEQ ID 
10084> was also identified. 

40 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB59829 GB:AJ012388 hypothetical protein [Lactococcus lactis] 
Identities = 137/231 (59%), Positives = 171/231 (73%), Gaps = 1/231 (0%) 

Query: 1 MIEWIQTHLPNVYQMGWEGAYGWQTAIVQTLYMTFWSFLIGGLMGLLGGLFLVLTSPRGV 60 
45 M EW PNV +GW G GW TAIVQTLYMTF S LIGGL+GL+ G+ +V+T+ G+ 

Sbjct: I MAEWFAHTFPNVWLGWTGETGWWTAIVQTLYMTFISALIGGLLGLIFGIGvVVTAEDGI 60 

Query: 61 IANKLVFGvLDKVVSVFRALPFIILLALIAPVTRVIVGTTLGSPAALVPLSLAVFPFFAR 120 
N+ +F +LDK+VS+ RA PFIILLA IAP+T+++VGT +G AALVPL+L V PF+AR 
50 Sbjct: 61 TPNRPLFWILDKIVSIGRAFPFIILLAAIAPLTKILVGTQIGVTAALVPLALGVAPFYAR 120 

Query: 121 QVQVVLAELDGGVIFAAQASGGTLWDII-WYLREGLPDLIRVSTVTLISLVGETAMAGA 179 

QVQ L +D G +EAAQ G DI+ VYLRE L LIRVSTVTLISL+G TAMAGA 
Sbjct: 121 QVQASLESvDHGKVFAAQTVGADFLDIVFTvYLREELASLIRVSTVTLISLIGLTAMAGA 180 



55 



Query: 180 IGAGGLGSVAITKGYNYSRDDITLVATILILLLIFFIQFLGDFLTRRLSHK 230 

IGAGGLG+ AI+ GYN +D+T ATILIL+ + +Q +GDFL RR+SH+ 
Sbjct: 181 IGAGGLGNTAISYGYNRFANDVTWFATILILIFVLLVQLVGDFLARRVSHR 231 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 567 1> which encodes the amino acid 
sequence <SEQ ID 5672>. Analysis of this protein sequence reveals the following: 

Possible site: 32 
>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-11.15 Transmembrane 194 - 210 ( 187 - 215) 
INTEGRAL Likelihood =-10.67 Transmembrane 28 - 44 ( 20 - 52) 
INTEGRAL Likelihood = -8.12 Transmembrane 70 - 86 ( 62 - 91) 

Final Results 

bacterial membrane Certainty=0 . 5458 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB59829 GB:AJ012388 hypothetical protein [Lactococcus lactis] 
Identities = 123/213 (57%) , Positives = 153/213 (71%) , Gaps = 1/213 (0%) 



Query: 


9 


GDAGWGLAIWNTLYMTIVPFIVGGAIGLLLGLLLVLTGPDGVIENKTICWVIDKVTSIFR 


68 






G+ GW AI TLYMT + ++GG +GL+ G+ +V+T DG+ N+ + W++DK+ SI R 




Sb j ct : 


19 


GETGWWTAIVQTLYMTFISALIGGLLGLIFGIGWVTAEDGITPNRPLFWILDKIVSIGR 


78 


Query: 


69 


AIPFVILIAILASFTYLLLRTTLGATAALVPLTFATFPFYARQVQWFSELDKGVIEAAQ 


128 






A PF+IL+A +A T +L+ T +G TAALVPL PFYARQVQ +D G +EAAQ 




Sb j ct : 


79 


AFPFIILLAAIAPLTKILVGTQIGVTAAIiVPLALGVAPFYARQVQASLESVDHGKVEAAQ 


138 


Query: 


129 


ASGATFWDIV- KVYLSEGLPDLIRVSTVTLI SLVGETAMAGAIGAGGLGNVAISYGYNRF 


187 






GA F DIV VYL E L LIRVSTVTLISL+G TAMAGAIGAGGLGN AISYGYNRF 




Sbjct: 


139 


TVGADFLDIVFTvYLREELASLIRVSTVTLISLIGLTAMAGAIGAGGLGNTAISYGYNRF 


198 


Query: 


188 


NNDVTWVAT I I I LL I I FAIQFIGDSLTRRFSHK 220 








NDVTW ATI+IL+ + +Q +GD L RR SH+ 




Sbjct: 


199 


ANDVTWFATILILIFVLLVQLVGDFLARRVSHR 231 





An alignment of the GAS and GBS proteins is shown below. 

Identities = 146/212 (68%) , Positives = 172/212 (80%) 



Query: 


19 


GAYGWQTAIVQTLYMTFWSFLIGGLMGLLGGLFLVLTSPRGVIANKLVFGVLDKWSVFR 


78 






G GW AI TLYMT F++GG +GLL GL LVLT P GVI NK + V+DKV S+FR 




Sb j ct : 


9 


GDAGWGLAIWNTLYMTIVPFIVGGAIGLLLGLLLVLTGPDGVIENKTICWVIDKVTSIFR 


68 


Query: 


79 


ALPFIILLALIAPOTRVIVGTTLGSPAALVPLSLAVFPFFARQVQVVLAELDGGVIEAAQ 


138 






A+PF+IL+A++A T +++ TTLG+ AALVPL+ A FPF+ARQVQW +ELD GVIEAAQ 




Sb j Ct : 


69 


AIPFVILIAILASFTYLLLRTTLGATAALVPLTFATFPFYARQVQWFSELDKGVIEAAQ 


128 


Query: 


139 


ASGGTLWDIIVVYLREGLPDLIRVSTVTLISLVGETAMAGAIGAGGLGSVAITKGYNYSR 


198 






ASG T WDI+ VYL EGLPDLIRVSTVTLISLVGETAMAGAIGAGGLG+VAI+ GYN 




Sb j ct : 


129 


ASGATFWDIVKVYLSEGLPDLIRVSTVTLISLVGETAMAGAIGAGGLGNVAISYGYNRFN 


188 


Query: 


199 


DDITLVATILILLLIFFIQFLGDFLTRRLSHK 230 








+D+T VATI+ILL+IF IQF+GD LTRR SHK 




Sb j Ct : 


189 


NDVTWVATI I ILLI I FAIQFIGDSLTRRFSHK 220 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1827 

A DNA sequence (GBSxl934) was identified in S.agalactiae <SEQ ID 5673> which encodes the amino 
acid sequence <SEQ ID 5674>. This protein is predicted to be alcohol dehydrogenase, zinc-containing (Zn- 
dependent). Analysis of this protein sequence reveals the following: 
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Possible site: 21 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.92 Transmembrane 71 - 87 ( 69 - 87) 



50 



5 Final Results 

bacterial membrane Certainty=0 .2168 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

10 A related GBS nucleic acid sequence <SEQ ID 9419> which encodes amino acid sequence <SEQ ID 9420> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF41759 GB:AE002488 alcohol dehydrogenase, zinc-containing 
[Neisseria meningitidis MC58] 
15 Identities = 135/246 (54%) , Positives = 186/246 (74%) , Gaps = 1/246 (0%) 

Query: 3 SHCEDGGWILGHLIEGTQAEYVHIPHADGSLYHAPEGVCDDALVMLSDILPTSYEIGVLP 62 

SHC +GGWILG++ I +GTQAEYV P+AD SL P+ V ++ ++LSD LPT++EIGV 
Sbjct: 102 SHCRNGGWILGYMIDGTQAEYVRTPYADNSLVPLPDNVNEEIALLLSDALPTAHEIGVQY 161 

20 

Query: 63 SHIKPGDTVCIVGAGPIGLSALLTAQPYSPAKIIMVDLSQKRLEASKKFGATHTILSTST 122 

+KPGDTV I GAGP+G+SALLTAQ YSPA 11+ D+ + RL+ +K+ GATHTI + ++ 
Sbjct: 162 GDVKPGDTVFIAGAGPVGMSALLTAQLYSPAAIIVCDMDENRLKLAKELGATHTI-NPAS 220 

25 Query: 123 QEVKEEIDKITKGRGVDWLECVGYPATFDICQNWSIGGHIANVGVHGKPVEFNLQDLW 182 

EV +++ I GVD +E VG PAT+++CQ++V GGHIA VGVHG+ V+F L+ LW 
Sbjct: 221 GE VSKQ VFAI VGEDG VDCAI EAVGI PATWNMCQDIVKPGGH I AWGVHGQS VDFKLEKLW 280 

., Query: 183 II^ITIOTGLWANTTEMLLEVLETGKIDATQLOTHHFKLSEIEEAYKVFKAAEENNTLK 242 
30 IK + + TGLVNANTTEML++ + + +D T+++THHFK SE+E+AY VFK A EN +K 

Sbjct: 281 IKMAITTGLWANTTEMLMKAISSSSVDYTKMLTHHFKFSELEKAYDVFKHAAENQVMK 340 

Query: 243 VIIEND 248 
V++E D 

35 Sbjct: 341 WLEAD 346 

A related DNA sequence was identified in S.pyogenes <SEQ ID 785> which encodes the amino acid 
sequence <SEQ ID 786>. Analysis of this protein sequence reveals the following: 

Possible site: 23 
40 >» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -5.41 Transmembrane 184 - 200 ( 183 - 203) 

Final Results 

bacterial membrane Certainty=0. 3 16 6 (Affirmative) < suco 

45 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 199/250 (79%) , Positives = 226/250 (89%) 

Query: 1 MPSHCEDGGWILGHLIEGTQAEYVHIPHADGSLYHAPEGVCDDALVMLSDILPTSYEIGV 60 

+ SHC+DGGWILGHLI GTQAEYVHIPHADGSLYHAP+ + D+ALVMLSDILPTSYEIGV 
Sbjct: 114 LSSHCQDGGWILGHLINGTQAEYVHIPHADGSLYHAPDTIDDEALVMLSDILPTSYEIGV 173 

55 Query: 61 LPSHIKPGDTVCIVGAGPIGLSALLTAQFYSPAKIIMVDLSQKRLEASKKFGATHTILST 120 

LPSH+KPGD VCIVGAGP+GL+ALLT QF+SPA IIMVDLSQ RLEA+K FGATHTI S 
Sbjct: 174 LPSHVKPGDNVCIVGAGPVGLAALLTVQFFSPANII^^VDLSQNRLEAAKTFGATHTICSG 233 

Query: 121 STQEVKEEIDKITKGRGVDVVLECTGYPATFDICQNWSIGGHIANVGVHGKPVEFNLQD 180 
60 S++EVK ID IT GRGVD+ +ECVGYPATFDICQ ++S+GGHIANVGVHGKPV+FNL + 

Sbjct: 234 SSEEVKAIIDDITNGRGVDISMECVGYPATFDICQKIISVGGHIANVGVHGKPVDFNLDE 293 
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Query: 181 LWIKlJITLOT , GLWANTTEMLLEVLETGKIDATQLVTHHFKLSEIEEAYKyFKAaEEI©^ , 240 

LWIKNITUSITGLVNANTTEMLL VL+TGKIDAT+L+THHFKLSE+E+AY+ FK A NN 
Sbjct: 294 LWIKNITLISrrGLWAOTTEMLIiNVLKTGKIDA 353 

5 Query: 241 LKVIIENDIT 250 

LKVH+NDI+ 
Sbjct: 354 LKVIIDNDIS 363 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
10 vaccines or diagnostics. 

Example 1828 

A DNA sequence (GBSxl935) was identified in S.agalactiae <SEQ ID 5675> which encodes the amino 
acid sequence <SEQ ID 5676>. This protein is predicted to be a dehydrogenase fragment. Analysis of this 
protein sequence reveals the following: 

15 Possible site: 20 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-10.46 Transmembrane 47 - 63 ( 33 - 66) 

Final Results 

20 bacterial membrane Certainty=0. 5182 (Affirmative) < succi 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The, protein has no significant homology with any sequences in the GENPEPT database. 
25 There is also homology to SEQ ID 786: 

Identities = 23/38 (60%) , Positives = 28/38 (73%) 

Query: 7 WRNSNMRAATYLSANELSLTDKAKPQVI KPTDAWXLV 44 
++ NM+AATYLS L L DK KP +IKPTDA+V LV 
30 Sbjct: 10 YKKLNMKAATYLSTGNLQLIDKPKPVIIKPTDAIVQLV 47 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1829 

35 A DNA sequence (GBSxl936) was identified in S.agalactiae <SEQ ID 5677> which encodes the amino 
acid sequence <SEQ ID 5678>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

>>> Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0 . 1001 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

45 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1830 

A DNA sequence (GBSxl937) was identified in S.agalactiae <SEQ ID 5679> which encodes the amino 
acid sequence <SEQ ID 5680>. This protein is predicted to be branched chain amino acid transport system 
II carrier protein (brnQ). Analysis of this protein sequence reveals the following: 

Possible site: 44 

»> Seems to have an uncleavable N-term signal seq 



INTEGRAL 


Likelihood 




-9 


66 


Transmembrane 


158 


- 174 


( 154 


- 177) 


INTEGRAL 


Likelihood 


= 


-6 


64 


Transmembrane 


233 


- 249 


( 231 


- 252) 


INTEGRAL 


Likelihood 




-5 


20 


Transmembrane 


37 


-' 53 


( 30 


- 57) 


INTEGRAL 


Likelihood 




-3 


98 


Transmembrane 


90 


- 106 


( 87 


- 108) 


INTEGRAL 


Likelihood 




-0 


80 


Transmembrane 


130 


- 146 


( 130 


- 146) 



Final Results 

bacterial membrane Certainty=0 . 4864 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9417> which encodes amino acid sequence <SEQ ID 941 8> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC00400 GB:AF008220 branch-chain amino acid transporter 
[Bacillus subtilis] 
Identities = 89/250 (35%), Positives = 139/250 (55%), Gaps = 18/250 (7%) 



Query: 


1 


MDALAS IAFAI IVIQASKQYGAITKKEITSMALKSGAIATFLLAFIYI FVGRIGATSQSL 


60 






MDALASI F ++V+ A K G K + + +K+G IA L FIY+ + +GATS + 




Sb j ct : 


199 


MDAIASIVFGVWVNAVKSKGVTQSKAIiAAACIKAGVIAALGLTFI 


258 


Query: 


61 


FKFANGSFIiHNTPI-DGGHvLSQSANFYLGIVGQA^ 


119 






P+ +G +LS S+++ G +G +LG AI +ACLTT+ GL+T+C +Y 




Sb j ct : 


259 


IG PVGEGAKI LSAS SHYLFGSLGNI VLGAAI TVACLTTS IGL VTSCGQY 


307 


Query: 


120 


FHKLLPKISHITWATIFTLIAITFYFGGLSEIIRWSLPVLYLLYPLTIVLIFLVFFDQKF 


179 






F KL+P +S+ TI TL ++ GL++II +S+P+L +YPL IV+I L F D+ F 




Sbjct: 


308 


FSKLIPALSYKIWTIVTLFSLIIANFGLAQI IAFSVPILSAIYPLAI VTIVLSFIDKIF 


367 


Query: 


180 


ESSRIVYQTSIAATAVAALYDALSKLGEMTGLFTIPSALTTFFTKWPLGEYSMGWISFA 


239 






+ RVY + T + ++D+ G G +LF +PL +GW+ 




Sb j ct : 


368 


KERREVYIACLIGTGLFSILDGIKAAGFSLG SLDVFLNANLPLYSLGIGWVLPG 


421 


Query: 


240 


ICGVLVGLIL 249 








I G ++G +L 




Sb j Ct : 


422 


IVGAVIGYVL 431 





A related DNA sequence was identified in S.pyogenes <SEQ ID 223 3> which encodes the amino acid 
sequence <SEQ ID 2234>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

>>> Seems to have a cleavable N-term signal seq. 



INTEGRAL 


Likelihood 




10 


83 


Transmembrane 


235 


- 251 


( 228 




258) 


INTEGRAL 


Likelihood 




-8 


49 


Transmembrane 


434 


- 450 


; 429 




454) 


INTEGRAL 


Likelihood 




-8 


12 


Transmembrane 


359 


- 375 


( 356 




377) 


INTEGRAL 


Likelihood 




-7 


86 


Transmembrane 


150 


- 166 


( 144 




171) 


INTEGRAL 


Likelihood 




-6 


00 


Transmembrane 


298 


- 314 


( 288 




316) 


INTEGRAL 


Likelihood 




-5 


95 


Transmembrane 


42 


- 58 


( 38 




63) 


INTEGRAL 


Likelihood 




-3 


35 


Transmembrane 


336 


- 352 


( 335 




354) 


INTEGRAL 


Likelihood 




-2 


81 


Transmembrane 


199 


- 215 


( 198 




218) 


INTEGRAL 


Likelihood 




-2 


18 


Transmembrane 


120 


- 136 


( 120 




138) 


INTEGRAL 


Likelihood 




-1 


81 


Transmembrane 


390 


- 406 


( 390 




407) 


INTEGRAL 


Likelihood 




-1 


01 


Transmembrane 


81 


- 97 


( 81 




97) 
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Final Results 

bacterial membrane Certainty=0 . 5331 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

5 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 161/253 (63%) , Positives = 197/253 (77%) 

Query: 1 MDALASIAFAIIVIQASKQYGAITKKEITSMALKSGAIATFLIiAFIYIFVGRIGATSQSL 60 
10 MDALAS+ FAI+VI+A+KQ+GA T KE+T + I> SGAIA LLA +YIFVGRIGATSQSL 

Sbjct: 202 MDALASLVFA1LVIEATKQFGAKTDKEMTKITLISGAIAILLLALVYIFVGRIGATSQSL 261 

Query: 61 FKFANGSFLLHNTPIDGGHVLSQSANFYLGIVGQAILGTAIFIACLTTATGLITACAEYF 120 
F F +GSF LH P++GG +LS ++ FYLG +GQA L IFLACLTT+TGLIT+ AEYF 
15 . Sbjct: 262 FPFIDGSFTLHGNPVNGGQILSHASRFYLGGIGQAFIAWIFIiACLTTSTGLITSSAEYF 321 

Query: 121 HKLLPKISHITWATIFTLIAITFYFGGLSEIIRWSLPVLYLLYPLTIVLIFLVFFDQKFE 180 

HKL+P +SHI WATIFTL++ FYFGGLS II WS PVL+LLYPLT+ LIFLV + F 
Sbjct: 322 HKLVPALSHIAWATI FTLLSAFFYFGGLS VI INWSAP VLFLLYPLTVDLI FL VLAQKCFN 381 

20 

Query: 181 SSRI VYQTSIAATAVAALYDALSKLGEMTGLFTIPSALTTFFTKWPLGEYSMGWISFAI 240 

+ IVY+T+I T + A++DAL L +MTGLF +P A+ TFF K VPLG++SMGWI FA 
Sbjct: 382 NDPIVYRTTIGLTFI PAI FDALLTLSQMTGLFHIiPEAWTFFQKTVPLGQFSMGWI I FAA 441 

25 Query: 241 CGVLVGLILKKVK 253 

G L+GLIL K K 
Sbjct: 442 IGFLIGLILSKTK 454 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
30 vaccines or diagnostics. 

Example 1831 

A DNA sequence (GBSxl938) was identified in S.agalactiae <SEQ ID 5681> which encodes the amino 
acid sequence <SEQ ID 5682>. This protein is predicted to be 30S ribosomal protein S12 (rpsL). Analysis 
of this protein sequence reveals the following: 

35 Possible site: 52 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3698 (Affirmative) < suco 

40 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9429> which encodes amino acid sequence <SEQ ID 9430> 
was also identified. 

45 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA78825 GB:Z15120 ribosomal protein S12 [Streptococcus pneumoniae] 
Identities = 64/71 (90%) , Positives = 68/71 (95%) 

Query: 1 MPTINQLWKPRKSKVEKSDSPALNIGYNSHRKVHTKLSAPQKRGVATRVGTMTPKKPNS 60 
50 MPTINQLVRKPRKSKVEKS SPALN+GYNSH+KV T +S+PQKRGVATRVGTMTPKKPNS 

Sbjct: 1 MPTINQLTOKPRKSKVEKSKSPALNVGYNSHKKVQTNVSSPQKRGVATRVGTMTPKKPNS 60 

Query: 61 ALRKFARVRLS 71 
ALRKFARVRLS 
55 Sbjct: 61 ALRKFARVRLS 71 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 5683> which encodes the amino acid 
sequence <SEQ ID 5684>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3879 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 44/48 (91%) , Positives = 47/48 (97%) 

Query: 24 LNIGYNSHRKVHTKLSAPQKRGVATRVGTMTPKKPNSALRKFARVRLS 71 
15 LNIGYNSH+KV TK++APQKRGVATRVGTMTPKKPNSALRKFARVRLS 

Sbjct: 1 LNIGYNSHKKVQTKMAAPQKRGVATRVGTMTPKKPNSALRKFARVRLS 48 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

20 Example 1832 

A DNA sequence (GBSxl939) was identified in S.agalactiae <SEQ ID 5685> which encodes the amino 
acid sequence <SEQ ID 5686>. This protein is predicted to be purR. Analysis of this protein sequence 
reveals the following: 

Possible site: 30 
25 >>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -0.37 Transmembrane 142 - 158 ( 142 - 159) 

Final Results 

bacterial membrane Certainty=0 . 1150 (Affirmative) < suco 

30, bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA10902 GB:AJ222642 purR [Lactococcus lactis] 
35 Identities = 143/269 (53%) , Positives = 195/269 (72%) , Gaps = 1/269 (0%) 

Query: 3 LRRSERMWISNYLINNPYTLTSIJSITFASKYG1AAKSSISEDIAIIKKAFEQAQIGDIKTV 62 

++R+ER+V +N+LIN+P + +LN + Y AKSSISED+ IK+ FE +G ++T 
Sbjct: 1 MKRNERLVDFTNFLINHPNQMIJJIjNELSKHYEVAKSSISEDLVFIKRVFENCGVGLVETF 60 

40 

Query: 63 TGASGGVIFTPTIAEAEAKEIVEELRQRLSENDRILPGGYIYLSDLLSTPKMLQSIGRII 122 

G+ GGV FTP I + + E+ +E+ + L E +RILPGGYIYLSD+L TP L+ IG+II 
Sbjct: 61 PGSLGGVRFTPYITDERSLEMSQEIAELLREENRILPGGYIYLSDILGTPSNLRKIGQII 120 

45 Query: 123 ANAYRGQKIDAvMTVATKGVPLANAVANVIiDVPFVIVRRDLKITEGSTVSVNYASGSSGR 182 

A+ Y +++D VMT+ATKG+ P+A +VA +LDVPFVIVRRD K+TEG+T+ +VNY SGSS R 
Sbjct: 121 AHEYHEKQVDVVMTIATKGIPIAQSVAEIIJDvPFVIvRRDPKVTEGATIiNvN™ 180 

Query: 183 IEKMFLSKRSLKPNSRVLIVDDFLKGGGTVSGMISLLSEFDSTLVGVAVFAENA-QEQRE 241 
50 +E M LSKRSL VLIVDDF+KG GT++GM SL+ EFD L GVAVF E + +R 

Sbjct: 181 VENMTLSKRSLSIGQNVLI VDDFMKGAGTINGMRSLVHEFDCLLAGVAVFLEGPFKGERL 240 

Query: 242 KMAYKSLLRVSE IDVKNNRVS VEAGNI FD 270 
YKS+L+V ID+ N + V+ GNIF+ 
55 Sbjct: 241 IDDYKSILKVDRIDIANRSIDVQLGNIFN 269 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 5687> which encodes the amino acid 
sequence <SEQ ID 5688>. Analysis of this protein sequence reveals the following: 

Possible site: 41 
>>> Seems to have an uncleavable N-term signal seq 
5 INTEGRAL Likelihood = -1.97 Transmembrane 142 - 158 ( 142 - 160) 

Final Results 

bacterial membrane Certainty=0. 1786 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAA10902 GB:AJ222642 purR [Lactococcus lactis] 
Identities = 142/269 (52%), Positives = 196/269 (72%), Gaps = 1/269 (0%) 

Query: 3 LRRSERMWISNYLINNPYKLTSLNTFATKYEAAKSSISEDIAIIKKAFEEANIGDIDTL 62 

++R+ER+V +N+LIN+P ++ +LN + YE AKSSISED+ IK+ FE +G ++T 
Sbjct: 1 MKRNERLVDFTNFLINHPNQMLNLNELSKHYEVAKSSISEDLVFIKRVFENQGVGLVETF 60 

20 Query: 63 TGASGGVIFTPSISETEARTIVEDLCQRLSESDRILPGGYIYLSDLLSTPKILQNIGRII 122 

G+ GGV FTP I++ + + +++ + L E +RILPGGYIYLSD+L TP L+ IG+II 
Sbjct: 61 PGSLGGVRFTPYITDERSLEMSQEIAELLREENRILPGGYIYLSDILGTPSNLRKIGQII 120 

Query: 123 ANAFKGEKIDAVMTVATKGVPLANAVANILSVPFVIVRRDLKITEGSTVSVNYASASSDR 182 
25 A+ + ++-+D VMT+ATKG+P+A +VA IL VPFVIVRRD K+TEG+T++VNY S SS R 

Sbjct: 121 AHEYHEKQVDvvMTIATKGIPIAQSVAEILDVPFVIVWDPKVTEGATLNvNYMSGSSSR 180 

Query: 183 IEKMFLSKRSLKPNSRVLIvDDFLRGGGTITGMISLLTEFDSTLVGVAVFAENA-QSERE 241 
+E M LSKRSL VLIVDDF+KG GTI GM SL+ EFD L GVAVF E + ER 

30 Sbjct: 181 VENMTLSKRSLSIGQNVLIVDDFMKGAGTINGMRSLVHEFDCLLAGVAVFLEGPFKGERL 240 

Query: 242 QMTFKSLLKVSEIDVKNNNVWEVGNIFD 270 

+KS+LKV ID+ N ++ V++GNIF+ 
Sbjct: 241 IDDYKSILKVDRIDIANRSIDVQLGNIEN 269 

35 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 234/270 (86%) , Positives = 255/270 (93%) 

Query: 1 MKLRRSERMWISNYLINNPYTLTSLNTFASKYGAAKSSISEDIAIIKKAFEQAQIGDIK 60 
40 MKLRRSERMWISNYLINNPY LTSLNTFA+KY AAKSSISEDIAIIKKAFE+A IGDI 

Sbjct: 1 MKLRRSERMWISNYLINNPYKLTSLNTFATKYEAAKSSISEDIAIIKKAFEEANIGDID 60 

Query: 61 TVTGASGGVIFTPTIAEAEAKEIVEELRQRLSENDRILPGGYIYLSDLLSTPKMLQSIGR 120 
T+TGASGGVI FTP+ I +E EA+ IVE+L QRLSE+DRILPGGYIYLSDLLSTPK+LQ+IGR 
45 Sbjct: 61 TLTGASGGVIFTPSISETEARTIVEDLCQRLSESDRILPGGYIYLSDLLSTPKILQNIGR 120 

Query: 121 IIANAYRGQKIDAVMWATKGVPIJOTAVANAnLDVPFVIvRRDLKITEGSTVSvNYASGSS 180 

IIANA++G+KIDAVMTVATKGVPLANAVAN+L VPFVI VRRDLKITEGSTVSVNYAS SS 
Sbjct: 121 IIANAFKGEKIDAVMWATKGVPLANAVANILSVPFVIvRRDLKITEGSTVSVNYASASS 180 

50 

Query: 181 GRIEKMFLSKRSLKPNSRVLIVDDFLKGGGTVSGMISLLSEFDSTLVGVAVFAENAQEQR 240 

RIEKMFLSKRSLKPNSRVLIVDDFLKGGGT++GMISLL+EFDSTLVGVAVFAENAQ +R 
Sbjct: 181 DRIEKMFLSKRSLKPNSRVLIVDDFLKGGGTITGMISLLTEFDSTLVGVAVFAENAQSER 240 

55 Query: 241 EKMAYKSLLRVSE IDVKNNRVSVEAGNI FD 270 

E+M +KSLL+VSEIDVKNN V VE GNIFD 
Sbjct: 241 EQMTFKSLLKVSEIDVKNNNVWEVGNIFD 270 



60 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1833 

A DNA sequence (GBSxl940) was identified in S.agalactiae <SEQ ID 5689> which encodes the amino 
acid sequence <SEQ ID 5690>. This protein is predicted to be cmp-binding-factor 1. Analysis of this 
protein sequence reveals the following: 

Possible site: 53 

>>> Seems to have no N-terminal signal sequence 



Certainty=0. 1753 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC44803 GB:U21636 cmp-binding-factor 1 [Staphylococcus aureus] 
Identities = 140/310 (45%) , Positives = 195/310 (62%) , Gaps = 6/310 (1%) 



Query: 


3 


INQMKKDELFEGFYLIKKAEVRKTRAGKDFIAFTFQDDTGEISGNMWDAQTYNVEEFVAG 


62 






I + + + F+L+ KA T GKD++ QD +GEI W A ++ 




Sbjct: 


4 


IENLNPGDSVDHFFLVHKATQGVTAQGKDYMTLHLQDKSGEIEAKFWTATKNDMATIKPE 


63 


Query: 


63 


KIVHMKGRREVYNGTPQ--VNQITLRNIKDGEPNDPRDFKEKPPINVDNVREYMEQMLFK 


120 






+IVH+KG Y G Q VNQI L +D + F + P++ ++E + L 




Sbjct: 


64 


EIVHVKGDIINYRGNKQMKMQIRLATTEDQLRTE--QFVDGAPLSPAEIQEEISHYLLD 


121 


Query: 


121 


IENATWQRVvRALYRKYNKEFFTYPAAKTNHHAFESGIAYHTATMVRIADSIGDIYPEIjN 


180 






IENA QR+ R L +KY + F+TYPAA ++HH F SGL+YH TM+R+A SI DIYP IiN 




Sb j ct : 


122 


IENANLQRITRHLLKKYQERFYTYPAASSHHHNFASGIjSYHvLTMLRIAKSICDIYPLI^ 


181 


Query: 


181 


kslmfagimlhdlakvielsgpdnteytirgnlighislideeltkilaelniddtkeev 


240 






KSL+++GI+LHD+ KV ELSGP T YT+ GNL+GHIS+ +E+ + ELNI+ EE+ 




Sbjct: 


182 


KSLLYSGIILHDIGKVRELSGPVATSYTVEGMjLGHISIASDEVVEAARELNIEG- -EEI 


239 


Query: 


241 


TVLRHVILSHHGQLEYGSPvRPRIMFAEIIHMimiDANMMMTTALNRvNEGEMlTOIF 


300 






+LRH+ILSHHG+LEYGSP P + EAEI+ IDNIDA MM A + ++G+ T++IF 




Sbjct: 


240 


MLLRHMILSHHGKLEYGSPKLPYLKEAEILCYIDNIDARMNMFEKAYKKTDKGQFTDKIF 


299 


Query: 


301 


AMDNRSFYKP 310 








++NR FY P 




Sbjct: 


300 


GLENRRFYNP 309 





A related DNA sequence was identified in S.pyogenes <SEQ ID 5691> which encodes the amino acid 
sequence <SEQ ID 5692>. Analysis of this protein sequence reveals the following: 

Possible site: 38 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1822 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 275/311 (88%) , Positives = 300/311 (96%) 

Query: 1 MKINQMKKDELFEGFYLIKKAETOKTRAGKDFIAFTFQDDTGEISGNMWDAQTYNVEEFV 60 

MKINQMKKD+LFEGFYLIK AEVRKTRAGKDFI+ TFQDDTGE I SGN+WDAQ YNVEEF 
Sbjct: 1 MKINQMKKDQLFEGFYLIKSAEVRKTRAGKDFISLTFQDDTGEISGNLWDAQPYNVEEFT 60 

Query: 61 AGKIVHMKGRREVYNGTPQvNQITLRNIKDGEPNDPRDFKEKPPINVDNVREYMEQMLFK 120 

AGK+V MKGRREVYNGTPQVNQITLRN++ •GEPNDP+DFKEK P++V VR+Y+EQMLFK 
Sbjct: 61 AGKOTFMKGRREVYNGTPQTOQITLRNVRPGEPNDPKDFKEKAPVSVTEVRDYLEQMLFK 120 



Final Results 

bacterial cytoplasm 

bacterial membrane 

bacterial outside 
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Query': 121 IENATWQRVVRALYRKYNKEFFTYPi^KTlMHAFESGIAYHTAT^IVRLADSIGDIYPEIJS^ 180 

IENATWQR+VRALYRKY+KEF+TYPAAKTNHHAFESGLAYHTATMTOLADSIGDI 
Sbjct: 121 IENATWQRIVRALYRKYDKEFYTYPAAKTNHHAFESGIAYHTATMTOIADSIGDIYPDLN 180 

5 Query: 181 KSLMFAGIMLHDLAKVIELSGPDNTEYTIRGNLIGHISLIDEELTKILAEIiNIDDTKEEV 240 

KSL+FAGIMLHDLAKVIEL+GPDNTEYT+RGNLIGHISLI+EE+TK+++EL IDDTKEEV 
Sbjct: 181 KSLLFAGIMLHDLAKVIELTGPDNTEYTVRGNLIGHISLINEEITKVISELQIDDTKEEV 240 

Query: 241 TVLRHVILSHHGQLEYGSPTOPRIMEMIIHMIDNIDAM^mTTAl^VlffiGEMTNRIF 300 
1 0 VLRHVILSHHGQLEYGSPWPRIMEAEIIHMIDNIDA1W1MMTTAL+RV+EGEMTNRIF 

Sbjct: 241 IVLRHVILSHHGQLEYGSPWPRIMFAEIIHMIDNIDA^M^1MMTTALSRVSEGEMTNRIF 300 

Query: 301 AMDNRSFYKPN 311 
AMDNRSFYKPN 
15 Sbjct: 301 AMDNRSFYKPN 311 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1834 

20 A DNA sequence (GBSxl941) was identified in S.agalactiae <SEQ ID 5693> which encodes the amino 
acid sequence <SEQ ID 5694>. Analysis of this protein sequence reveals the following: 



25 



30 



Possible site: 21 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-14.59 Transmembrane 2 - 18 ( 1-22) 



Final Results 

bacterial membrane Certainty=0. 6838 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5695> which encodes the amino acid 
sequence <SEQ ID 5696>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

35 »> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-12.05 Transmembrane 3 - 19 ( 1-26) 

Final Results 

bacterial membrane Certainty=0. 5819 (Affirmative) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 309/424 (72%) , Positives = 370/424 (86%) , Gaps = 3/424 (0%) 

45 

Query: 1 MLVIILIIVLASLTVTIISYQKMTELTKSVEKQLEDNADNLSDQLTYQIEVAQKDQILTL 60 

+++ +L++VL L ++ K+ L + + LE NADNLSDQ+TYQ+ + A K Q+L L 
Sbjct: 3 LILFLLVLVLLGLGAYLLF--KVNGLQHQLAQTLEGNADNLSDQMTYQLDTANKQQLLEL 60 

50 Query: 61 TNQLNRMQQEIYQLLTDMRTELNQHLTESRDRSDKRLELINSNLSQSVQKMQDSNEKRLD 120 

T +NR Q +YQ LTD+R L++ L++SRDRSDKRLE IN ++QS++ MQ+SNEKRL+ 
Sbjct: 61 TQLMNRQQAGLYQQLTDIRDVLHRSLSDSRDRSDKRLEKINQQVNQSLKNMQESNEKRLE 120 

. Query: 121 Q^QTVEEKLEKTLQTRLQTSFETOSRQLESVNCGLGEMKTVAQDVGTLNKVLSNTKTRG 180 
55 +MRQ VEEKLE+TL+ RL SF++VS+QLESVN+GLGEM++VAQDVGTLNKVLSNTKTRG 

Sbjct: 121 KmQIVEEKLEETLKNRLHASFDSVSKQLESVNKGLGEMRSVAQDVGTLNKVLSNTKTRG 180 

Query: 181 ILGELQLGQIIEDIMTVSQYEREFPTVSGSSERVEYAIKLPGNGQGDYIYLPIDSKFPLE 240 
ILGELQLGQIIEDIMT SQYEREF TVSGSSERVEYAIKLPGNGQG YIYLPIDSKFPLE 
60 Sbjct: 181 ILGELQLGQIIEDIMTSSQYEREFVTVSGSSERVEYAIKLPGNGQGGYIYLPIDSKFPLE 240 
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Query: 241 DYYRLEDAYELGDKVQIELYRKSLLAS I RKFAKDINNKYIiNPPETTNFGIMFIjPTEGLYS 300 

DYYRLEDAYE+GDK+ IE RK+LLA+I++FAKDI+ KYLNPPETTNFG+MFLPTEGLYS 
Sbjct: 241 DYYRLEDAYEVGDKIAIEASRKALLAAIKRFAKDIHKKYLNPPETTNFGVMFLPTEGLYS 300 

5 

Query: 301 EVVRNATFFDSLRRDENIWAGPSTLSALnNSLSVGFKTIiNIQKNANDISKILGNVKVEF 360 

EVVRNA+FFDSLRR+ENIWAGPSTLSALIiNSLSVGFICriiNIQKNA+DISKILGNVK+EF 
Sbjct: 301 EWRNASFFDSLRREENIWAGPSTLSALIiNSLSVGFKTLNIQKNADDISKILGNVKLEF 360 

10 Query: 361 GKFGGMLSKAQKQLOTASKSIDSLLTTRTNAIIRVIiNTVEEHQDQATTSLLNLPITEEEE 420 

KFGG+L+KAQKQ+NTA+ ++D L++TRTNAI+R LNTVE +QDQAT SLLN+P+ EEE 
Sbjct: 361 DKFGGLLAKAQKQMOTANOTLDQLISTRTmiWAIMVETYQDQATKSLIiNM 420 

Query: 421 INEN 424 
15 NEN 

Sbjct: 421 -NEN 423 

SEQ ID 5694 (GBS88) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 18 (lane 2; MW 48kDa). 

20 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1835 

A DNA sequence (GBSxl942) was identified in S.agalactiae <SEQ ID 5697> which encodes the amino 

acid sequence <SEQ ID 5698>. Analysis of this protein sequence reveals the following: 

25 Possible site: 44 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2722 (Affirmative) < suco 

30 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB13453 GB:Z99112 yloS [Bacillus subtilis] 
35 Identities = 75/217 (34%) , Positives = 109/217 (49%) , Gaps = 12/217 (5%) 

Query: 1 MTKIALFAGG DLTYFEYDFDYFVGIDRGSLFLLKNGLSLDMAVGDFDSITEDEL 54 

M I + AGG DLT + + ++G+D+G++ LL G+ A GDFDSITE E 

Sbjct: 1 MKTINIVAGGPKNLIPDLTGYTDEHTLWIGVT3KGTVTLLDAGIIPVEAFGDFDSITEQER 60 

Query: 55 LYIKHYCSNIVSASAEKNDTDTELALKTIFKEFPEAQVTVFGAFGGRIDHMMSNIFLPSD 114 

1+ + AEK+ TD +LAL ++ P+ + +FG GGR DH + NI L 

Sbjct: 61 RRIEKAAPALHVYQAEKDQTDLDLALDWALEKQPDI - IQI FGITGGRADHFLGNIQLLYK 119 

45 ' " Query: 115 RDLEPFMSQIRLKDEQNIVTYLPSGKNQVSRIEGMSYVSFMPESES - -TLQISGAKYELN 172 

+IRL D+QN + P G+ + + E Y+SF+P SE L ++G KY LN 
' Sbjct: 120 GVKTNI--KIRLIDKQNHIQMFPPGEYDIEKDENKRYISFIPFSEDIHELTLTGFKYPLN 177 

Query: 173 KSNY- FKKKMYSSNEFMTSPIEVELKDGYLI IIYSKD 208 
50 + + SNE + S G LI+I S D 

Sbjct: 178 NCHITLGSTLCISNELIHSRGTFSFAKGILIMIRSTD 214 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5699> which encodes the amino acid 
sequence <SEQ ID 5700>. Analysis of this protein sequence reveals the following: 

55 Possible site: 55 

>>> Seems to have no N-terminal signal sequence 

Final Results 



40 
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bacterial cytoplasm Certainty=0 .2467 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 130/208 (62%) , Positives = 166/208 (79%) 

Query: 1 MTKIALFAGGDLTYFEYDFDYFVGIDRGSLFIiKNGLSLDMAVGDFDSITEDELLYIKHY 60 

M+K+ALFAGGDL+Y DFDYFVGIDRGSLFLL+NGL L+MAVGDFDS+++ IK 
Sbjct: 1 MSKVALFAGGDLSYISRDFDYFVGIDRGSLFLLENGLPLNMAVGDFDSVSQKAFTDIKEK 60 

Query: 61 CSNIVSASAEKNDTDTELALKTIFKEFPEAQVTVFGAFGGRIDHMMSNIFLPSDRDLEPF 120 

++A EKNDTDTELALK +F FPEA+VT+FGAFGGR+DH++SNIFLPSD + PF 
Sbjct: 61 AELFITAHPEKNDTDTELALKEVFARFPEAEVTI FGAFGGRMDHLLSNI FLPSDPGIAPF 120 

Query: 121 MSQIRLKDEQNIVTYLPSGKNQVSRIEGMSWSFMPESESTLQISGAKYELNKSNYFKKK 180 

M+QI L+D+QN++TY P+G++ + + EGM+YV+FM E E+ L I+GAK+EL + N+FKKK 
Sbjct: 121 MAQIALRDQQNMITYRPAGQHLIHQEEGMTYVAFMAEGEADLTITGAKFELTQDNFFKKK 180 

Query: 181 MYSSNEFMTSPIEVELKDGYLIIIYSKD 208 

+YSSN F+ PI V L GYLIII SKD 
Sbjct: 181 IYSSNAFIHQPITVSLPSGYLIIIQSKD 208 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1836 

A DNA sequence (GBSxl943) was identified in S.agalactiae <SEQ ID 570 1> which encodes the amino 
acid sequence <SEQ ID 5702>. This protein is predicted to be ribulose-phosphate 3-epimerase (rpe). 
Analysis of this protein sequence reveals the following: 

Possible site: 18 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.59 Transmembrane 124 - 140 ( 124 - 141) 

Final Results 

bacterial membrane Certainty=0 . 1638 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06221 GB:AP001515 unknown conserved protein [Bacillus halodurans] 
Identities = 113/211 (53%) , Positives = 153/211 (71%) 



Query: 


5 


KIAPSILAADYANFANELKRIEETTAEYVHIDIMDGQFVPNISFGADWSSMRKHSKLVF 


64 






KIAPSIL+AD+AN NE++ +E A+Y+H+D+MDG FVPNI+ G +V ++R + h 




Sb j ct : 


3 


KIAPSILSADFANLGNEIQDVERGGADYIHVDVMDGHFVPNITIGPLIVDAIRPVTTLPL 


62 


Query: 


65 


DCHLMWDPERYIEAFAQAGADIMTIHVEATKHIHGALQKIKEAGMKAGWINPGTPVES 


124 






D HLM+ P+ YI AFA+AGAD I +T+HVEA H+H L IKE+G+KAGW+NP TPV S 




Sb j ct : 


63 


DvHLMIEQPDGYIPAFAKAGADIITVHVEACPHLHRTLHLIKESGVKAGVVLNPATPVSS 


122 


Query: 


125 


LIPILDLVDQILIMTWPGFGGQAFIPEMMSKVKTVAAWRKEYGHHYDIEVDGGIDNTTI 


184 






+ +L VD +L MTVNPGFGGQ FIP ++ K+K +A+ +KE G ++IEVDGG++ T 




Sb j ct : 


123 


IQHvLSDVDMVLFMTWPGFGGQRFIPSVLPKLKELASLKKEQGLTFEIEVDGGVNEETA 182 



Query: 185 KAAAEAGANVFVAGSYLFKASDLPAQvETLR 215 

K EAGANV VAGS +F D A ++ +R 
Sbjct: 183 KQCVEAGANVLVAGSAVFNEEDRAAAIKGIR 213 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 5703> which encodes the amino acid 
sequence <SEQ ID 5704>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 0072 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 183/219 (83%) , Positives = 198/219 (89%) 

Query: 1 MSTNKIAPSILAADYANFANELKRIEETTAEYVHIDIMDGQFVPNISFGADWSSMRKHS 60 

MST KIAPSILAADYANFA+EL RIEET AEYVHIDIMDGQFVPNISFGADW+SMRKHS 
Sbjct: 1 MSTLKIAPSILAADYANFASELARIEETDAEYVHIDIMDGQFVPNISFGADWASMRKHS 60 

Query: 61 KLVFDCHLMWDPERYIEAFAQAGADI MT I HVEATKH I HGALQKI KEAGMKAGWINPGT 120 

KLVFDCHLMWDPERY+EAFAQAGADIMTIH E+T+HIHGALQKIK AGMKAGWINPGT 
Sbjct: 61 KLVFDCHLIWVDPERYVEAFAQAGADIMTIHTESTRHI HGALQKI KAAGMKAGWINPGT 120 

Query: 121 PVESLIPILDLVDQILIMTVNPGFGGQAFIPEMMSKVKTVAAWRKEYGHHYDIEVDGGID 180 

P +L P+LDLVDQ+LIMTVNPGFGGQAFIPE + KV TVA WR E G +DIEVDGG+D 
Sbjct: 121 PATALEPLLDLVDQVIjIMTTOPGFGGQAFIPECLEKvATVAKWRDEKGLSFDIEVDGGVD 180 

Query: 181 NTTIKAAAEAGANVFVAGSYLFKASDLPAQVETLRVALD 219 

N TI+A EAGANVFVAGSYLFKASDL +QV+TLR AL+ 
Sbjct: 181 NKTIRACTEAGANWVAGSYLFKASDLVSQVQTLRTALN 219 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1837 

A DNA sequence (GBSxl944) was identified in S.agalactiae <SEQ ID 5705> which encodes the amino 
acid sequence <SEQ ID 5706>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2098 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB13451 GB:Z99112 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 148/296 (50%) , Positives = 202/296 (68%) , Gaps = 14/296 (4%) 

Query: 2 QGRIVKSLAGFYYV ESDGWYQTRARGNFRKKGQIPYVGDWVEFSSQDQSEGYILS 57 

+G+I+K+L+GFYYV E V Q R RG FRK P VGD+V + +++ EGY++ 
Sbjct: 3 EGKI I KALSGFYYVLDESEDSDKVI QCRGRGI FRKNKITPLVGDYWYQAENDKEGYLME 62 

Query: 58 IEERKNSLWPPIVNIDQAWIMSAKEPDFNANLLDRFLVLLEYKMIQPIIYISKLDLLD 117 

I+ER N L+RPPI N+DQAV++ SA +P F+ LLDRFLVL+E IQPII I+K+DL++ 
Sbjct: 63 IKERTIffiLIRPPICNVDQAVLVFSAVQPSFSTALLDRFL^ 122 

Query: 118 DLWIDDIR EHYQNIGY-VFCYSQEE LLPLLANKVTVFMGQTGVGKSTLLN 167 

D D 1+ E Y+NIGY V+ S ++ ++P +K TVF GQ+GVGKS+LLN 

Sbjct: 123 DQDTEDTIQAYAEDYRNIGYDVYLTSSKDQDSLADIIPHFQDKTTVFAGQSGVGKSSLLN 182 



Query: 168 KIAPELKLETGEISGSLGRGRHTTRAVSFYNVHKGKIADTPGFSSLDYEVDNAEDLNESF 227 
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I+PEL L T EIS LGRG+HTTR V + G +ADTPGFSSL++ ' E+L +F 
Sbjct: 183 AI S PELGLRTNEI SEHLGRGKHTTRHVELIHTSGGLVADTPGFSSLEFTD I EEEELGYTF 242 

Query: 228 PELRRLSHFCKFRSCTHTHEPKCAVKEALTQGQLWQVRYDNYLQFLSEIESRRETY 283 
5 P++R S CKFR C H EPKCAVK+A+ G+L Q RYD+Y++F++EI+ R+ Y 

Sbjct: 243 PDIREKSSSCKFRGCLHLKEPKCAVKQAVEDGELKQYRYDHYVEFMTEIKDRKPRY 298 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5707> which encodes the amino acid 
sequence <SEQ ID 5708>. Analysis of this protein sequence reveals the following: 

10 Possible site: 17 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2290 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 230/290 (79%) , Positives = 257/290 (88%) 

20 

Query: 1 MQGRIVKSLAGFYYVESDGVVYQTRARGNFRKKGQIPYVGDWVEFSSQDQSEGYILSIEE 60 

+QG+I+KSIAGFYYVES+G VYQTRARGNFRK+G+ PYVGD V+FS++D SEGYIL+I 
Sbjct: 1 LQGKIIKSLAGFYYVESEGQVYQTRARGNFRKRGETPYVGDIVDFSAEDNSEGYILAIHP 60 

25 Query: 61 RKNSLVRPPIVNIDQAWIMSAKEPDFNANLLDRFLVLLEYKMIQPIIYISKLDLLDDLV 120 

RKNSLVRPPIVNIDQAWIMSAKEP+FN+NLLDRFL+IiLE+K I P++YISK+DLLD 
Sbjct: 61 RKNSLVRPPIVNIDQAWIMSAKEPEFNSNLLDRFL1LLEHKAIHPWYISKMDLLDSPE 120 

Query: 121 VIDDIREHYQNIGYVFCYSQEELLPLLANKVTVFMGC/rGVGKSTLLNKIAPELKLETGEI 180 
30 II YQ IGY F S EELLPLLA+K+TVFMGQTGVGKSTLLN+IAPEL LE GEI 

Sbjct: 121 EIKAIGRQYQAIGYDFVTSLEELLPLIADKITVFMGQTGVGKSTLITOIAPELALEIGEI 180 

Query: 181 SGSLGRGRHTTRAVSFYNVHKGKIADTPGFSSLDYEVDNAEDLNESFPELRRLSHFCKFR 240 
S SLGRGRHTTRAVS FYN H GKIADTPGFSSLDY++ NAEDLNE+FPELRRLSH CKFR 
35 Sbjct: 181 SDSLGRGRHTTRAVSFYNTHGGKIADTPGFSSLDYDIANAEDLNEAFPELRRLSHECKFR 240 

Query: 241 SCTHTHEPKCAVKEALTQGQLWQVRYDNYLQFLSEIESRRETYKKVIKRK 290 

SCTHTHEPKCAVK AL G+LW VRY++YLQFLSEIE+RRETYKKVIKRK 
Sbjct: 241 SCTHTHEPKCAVKAALETGELWPVRYEHYLQFDSEIENRRETYKKVIKRK 290 

40 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1838 

A DNA sequence (GBSxl945) was identified in S.agalactiae <SEQ ID 5709> which encodes the amino 
45 acid sequence <SEQ ID 571 0>. This protein is predicted to be rRNA. Analysis of this protein sequence 
reveals the following: 

Possible site: 17 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.43 Transmembrane 259 - 275 ( 259 - 275) 

50 



55 



Final Results 

bacterial membrane Certainty=0. 1171 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15937 GB:Z99124 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 95/278 (34%) , Positives = 147/278 (52%) , Gaps = 16/278 (5%) 
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14 


SYFACPKCONPLIKESN-SLKCSDN-HCFDLSKFG5YVNLLGGKKVDEHYDKKSFENR-OL 


70 






Q P PP P x 4- Q CT, pxa, u "G'nT.Cla- nV\7W T. Tf V Y 4- PP P 4-T, 




Sb j ct : 


8 


SMFRCPLCDSSMDAASGKSLICTERGHTFDLSRHGYWFLT-KPVKTSYGMLFEARSRL 


66 


Oiif^yv * 
v UCJ - j • 


71 


VLENGYYNHI LEAI SKVLENNSQFH SVLDIGCGEGFYSRQLVNKHEKTFIiAF D 


123 






j, P CXx. X j- x x7ATj.xxx 4. P xxT.Fl (^IPi^lPt^l x T, A Tl 




Sbjct: 


67 


IGECGFFDPLHDAIAELISHPKSGHEAFTILDSGCGEGSHLNALCGFDYAGKAAIGTGID 


126 




124 


ISKDSIOIiAAKSDOSRLVKWFVSDLANLPIODSSIDIILDIFSPANYKEFRRVLSDDGIL 


183 






xcstcd t axtcx a- 4- w vxn+A p n nxxT« tpqpx'ntv pp Pxt. xriftxT. 




Sbjct: 


127 


LSKDGILKASKAFKDLM- -WAVADVARAPFHDRQFDVVLSIFSPSNYAEFHRLLKNDGML 


184 


Query: 


184 




243 






xTnA7P 4.O.J.J. PT.Px XX VQ"M XX P Itf XX CSd 
TiXVVtr TT-r-r xL.UK.-r xt I olM tt I? 1M tt WW 




Sb j ct : 


185 


IKVVPRSDYLIELROFLYTDSPRRTYSNT^ 


244 


Query: 


244 


FIDMTPLLFSVDKTTIDW- - -ASISEITVGALIVIGKK 278 








+ MTPL +S K + ++ITV I+IG K 




Sbjct: 


245 


LLKMTPLAWSAPKDRVSLLKEMKSADITVDVDILIGMK 282 





No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1839 

A DNA sequence (GBSxl946) was identified in S.agalactiae <SEQ ID 571 1> which encodes the amino 
acid sequence <SEQ ID 5712>. This protein is predicted to be dimethyladenosine transferase (ksgA). 
Analysis of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3257 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB11818 GB:Z99104 dimethyladenosine transferase [Bacillus subtilis] 
Identities = 157/284 (55%) , Positives = 215/284 (75%) , Gaps = 2/284 (0%) 



Query: 


3 


IADKTVTRAILERHGFTFKKSFGQNFLTDTNILQKIVDTAEIDKGVNVIEIGPGIGALTE 


62 






IA T+ IL+++GF+FKKS GQNFL DTNIL +IVD AE+ + VIEIGPGIGALTE 




Sb j ct : 


5 


IATPIRTKEILKKYGFSFKKSLGQNFLIDTNIIjNRIvDHAEVTEKTGVIEIGPGIGALTE 


64 


Query: 


63 


FLAENAAEVMAFEIDDRLIPILADTLARFDNVQvVNQDILKADLQTQIQA-FKNPDLPIK 


121 






LA+ A +V+AFEID RL+PIL DTL+ ++NV V++QD+LKAD+++ 1+ F++ D I 




Sb j ct : 


65 


QIAKRAKKWAFEIDQRLLPILKDTLSPYENVTVTHQDVLKADVKSVIEEQFQDCD-EIM 


123 


Query: 


122 


WANLPYYITTPILMHLIESKIPFAEFVVMIQKEVADRISAMPNTKAYGSLSIAVQYYMT 


181 






WANLPYY+TTPI+M L+E +P WM+QKEVA+R++A P++K YGSLSIAVQ+Y 




Sb j ct : 


124 


WANLPYYVTTPIIMKLLEEHLPLKGIVVMLQKEVAERMAADPSSKEYGSLSIAVQFYTE 


183 


Query: 


182 


AKVSFIVPRTVFVPAPNVDSAILK^WRRDQPWSVQDEDFFFRVSKVAFVHRRKTLWNNL 


241 






AK 1VP+TVFVP PNVDSA+++++ RD P V V++E FFF++ K +F RRKTL NNL 




Sb j ct : 


184 


AKTVMIVPKWFVPQP1WDSAVIRLILRIX3PAVDVENESFFFQLIKASFAQRRKTLLNNL 


243 


Query: 


242 


TSHFGKSEDTKAKLEKALEI AKI KPS IRGEALS I PDFASLADAL 285 








++ + + K+ +E+ LE I RGE+LSI +FA+L++ h 




Sb j ct : 


244 


VNNLPEGKAQKSTIEQVLEETNIDGKRRGESLSIEEFAALSNGL 287 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 571 3> which encodes the amino acid 
sequence <SEQ ID 5714>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2420 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 257/290 (88%) , Positives = 275/290 (94%) 



Query: 


1 


MRIADKTVTRAILERHGFTFKKSFGQNFLTDTNILQKI vDTAEIDKGVNVIEIGPGIGAL 


60 






MR IAD +VT+A+L+RHGFTFKKSFGQNFLTDTNILQKIVDTAEID+ VNVIEIGPGIGAL 




Sb j ct : 


9 


MRIADYSVTKA.VLDRHGFTFKKSFGQNFLTDTNILQKIVDTAEIDQNUNVIEIGPGIGAL 


68 


Query: 


61 


TEFLAENAAEVMAFEIDDRLIPILADTLARFDNVQWNQDILKADLQTQIQAFKNPDLPI 


120 






TEFLAENAAEVMAFEIDDRL+PILADTL FDNVQWNQDILKADLQTQI+ FKNPDLPI 




Sb j ct : 


69 


TEFLAENAAE VMAFE IDDRLVP I LADTLRDFDNVQWNQD I LKRDLQTQI KQFKNPDLPI 


128 


Query: 


121 


KWANLPyYITTPILMHLIESKIPFAEFWMIQKEVADRISAMPNTKAYGSLSIAVQYYM 


180 






KWANLPYYITTPILMHIiIESKIPF EFWM+Q+EVADRISA PNTKAYGSLS I AVQYYM 




Sb j ct : 


129 


KOTANLPYYITTPILMHLIESKIPFQEFVVMMQREVADRISAEPNTKAYGSLSIAVQYYM 


188 


Query: 


181 TAKVSFIVPRTVFVPAPNVDSAILKMVRRDQPVVSVQDEDFFFRVSKVAFVHRRKTLWNN 


240 






TAKV+FIVPRTVFVPAPNVDSAILKMVRRDQP++ V+DEDFFFRVS+++FVHRRKTI1WNN 




Sbjct: 


189 


TAKVAFIVPRWWPAPNVDSAILKMVRRIX2PLIKVKDEDFFFRVSRLSFVHRRKTLWNN 


248 


Query: 


241 


LTSHFGKSEDTKAKLEKALEIAKI KPS IRGEALSI PDPASIADALKEVGI 290 








LTSHFGKSED KAKLEK L +A IKPSIRGEALSI DF' IADALKEVG+ 




Sbjct: 


249 


LTSHFGKSEDIKAKLEKGLALADIKPSIRGEALSIQDFGKLADALKEVGIi 298 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1840 

A DNA sequence (GBSxl947) was identified in S.agalactiae <SEQ ID 5715> which encodes the amino 
acid sequence <SEQ ID 5716>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0736 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1841 

A DNA sequence (GBSxl948) was identified in S.agalactiae <SEQ ID 5717> which encodes the amino 
acid sequence <SEQ ID 571 8>. Analysis of this protein sequence reveals the following: 
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Possible site: 59 

>>> Seems to have no N- terminal signal sequence 



Final Results 

5 bacterial cytoplasm Certainty=0 . 3031 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

10 >GP:CAB11817 GB:Z99104 similar to hypothetical proteins [Bacillus subtilis] 

Identities = 81/179 (45%) , Positives = 117/179 (65%) , Gaps = 4/179 (2%) 

Query: 7 IQEVIWEGKDDTANLRRFYNVDTYETRGSAIDEDDLERIERLHNLRGVIVFTDPDYNGE 66 
I+E+IWEG+DDTA ++ + DT ET GSAID+ +++I RGVI+ TDPD+ GE 

15 Sbjct: 3 IKEIIWEGRDDTARIKLAVDADTIETNGSAIDDHVIDQIRLAQKTRGVIILTDPDFPGE 62 

Query: 67 RIRKIIMNAIPTVRHAFLNRDEAKPGSKTKGRSLGVEHASFEDLQKALSKVTQHFDDEDH 126 

+IRK I A+P +HAFL + AKP +K R +GVEHAS E ++ L V + + + 
Sbjct: 63 KIRKTISEAVPGCKHAFLPKHLAKPKNK RGIGVEHASVESIRACLENVHEEMEAQPS 119 

20 

Query: 127 FDITQADLIRWGFITASDSRKRREYLGNQLRIGYSNGKQLLKRLRLFGvTKAEVEECME 185 

DI+ DLI G I ++ RRE LG+ L+IGY+NGKQL KRL++F + K++ ++ 
Sbjct: 120 -DISAEDLIHAGLIGGPAAKCRRERLGDLLKIGYTNGKQLQKRLQMFQIKKSDFMSALD 177 

25 A related DNA sequence was identified in S.pyogenes <SEQ ID 571 9> which encodes the amino acid 
sequence <SEQ ID 5720>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

>>> Seems to have no N-terminal signal sequence 

30 Final Results 

bacterial cytoplasm Certalnty=0 . 1474 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 An alignment of the GAS and GBS proteins is shown below. 

Identities = 146/187 (78%) , Positives = 165/187 (88%) 

Query: 1 MMKKIDIQEVIVVEGKDDTANLRRFYNVDTYETRGSAIDEDDLERIERLHNLRGVIVFTD 60 
+ +KI + 1 QEV+ WEGKDDTANLRRFY VDTYETRGSAI E+DLERI RL++LRGVIV TD 
40 Sbjct: 15 LTEKINIQEVLVVEGKDDTANLRRFYEVDTYETRGSAITEEDLERINRLNDLRGVIVLTD 74 

Query: 61 PDYNGERIRKI IMNAI PTVRHAFLNRDEAKPGSKTKGRSLGVEHASFEDLQKALSKVTQH 120 

PDYNGERIRK+IM A+PT RHAFLNR+EA P SK+KGRSLGVEHA+FEDLQKAL+ VTQ 
Sbjct: 75 PDYNGERIRKLIMAAVPTARHAFLNRNEAVPSSKSKGRSLGVEHANFEDLQKALAHVTQQ 134 

45 

Query: 121 FDDEDHFDITQADLIRWGFITASDSRKRREYLGNQLRIGYSNGKQLLKRLRLFGVTKAEV 180 

+DDE +FDI Q DLIR G + ASDSRKRREYLG +LRIGY+NGKQLLKRL LFG+T AEV 
Sbjct: 135 YDDESYFDIRQTDLIRLGLLMASDSRKRREYLGEKLRIGYANGKQLLKRLELFGITLAEV 194 

50 Query: 181 EECMEGY 187 

EE ME Y 
Sbjct: 195 EEVMETY 201 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
55 vaccines or diagnostics. 

Example 1842 

A DNA sequence (GBSxl949) was identified in S.agalactiae <SEQ ID 5721> which encodes the amino 
acid sequence <SEQ ID 5722>. Analysis of this protein sequence reveals the following: 

Possible site: 15 
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>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4955 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10139> which encodes amino acid sequence <SEQ ID 
10140> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB11815 GB:Z99104 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 115/254 (45%) , Positives = 172/254 (67%) 



Query: 


28 


IFDTHTHtNVENFEGKIDEEINIiASELGvTKMNWGFDQDTISKSLELSSQYAQVYSTIG 


87 






+FDTH HLN E ++ ++E I A V ++ WGFD+ TI++++E+ +Y +Y+ IG 




Sb j ct : 


2 


LFDTHAHLNAEQYDTDLEEVIERAKAEKVERIVWGFDRPTITRAMEMIEEYDFIYAAIG 


61 


Query: 


88 


WHPTEAGSYDDNIESMIISHLENPKVIAIiGEIGLDYYWMEDPKDIQIEVFKRQIELSKEY 


147 






WHP +A + + I + KV+A+GE+GLDY+W + PKDIQ EVF+ QI L+KE 




Sb j ct : 


62 


WHPVDAIDMTEEDLAWIKELSAHEKWAIGEMGLDYHWDKSPKDIQKEVFRNQIALAKEV 


121 


Query: 


148 


NLPFVVHTRDALEDTYEVIKESGVGPFGGIMHSFSGSLEMAQKFIDLGMMISFSGVVTFK 


207 






NLP ++H RDA ED ++KE G GGIMH F+GS E+A++ + + +SF G VTFK 




Sb j ct : 


122 


NLPIIIHNRDATEDVOTILKEEGAEAVGGIMHCFTGSAEVARECMKMNFYLSFGGPVTFK 


181 


Query: 


208 


KALDVQFAARELPLDKILVETDAPYIAPVPKRGRENKTAYTRYVVEKIAELRGITVEEVA 


267 






A +E +E+P D++L+ETD P+L P P RG+ N+ +Y +YV E+IAEL+ +T EE+A 




Sb j ct : 


182 


NAKKPKEWKEIPNDRLLIETDCPFLTPHPFRGKRNEPSYVKYVAEQIAELKEMTFEEIA 


241 


Query: 


268 


EATYQNAVRI FRLD 281 








T +NA R+FR++ 




Sb j ct : 


242 


SITTENAKRLFRIN 255 





A related DNA sequence was identified in S.pyogenes <SEQ ID 5723> which encodes the amino acid 
sequence <SEQ ID 5724>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2817 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 190/258 (73%) , Positives = 227/258 (87%) 



Query: 


24 


DMIKIFDTHTHLNVENFEGKIDEEINLASELGVTKMNWGFDQDTISKSLELSSQYAQVY 


83 






+ + IFDTHTHLNV F+G EE+ LA E+GV NWGFDQ TIS +L L+++YA +Y 




Sb j ct : 


38 


EKLTIFDTHTHLNVAEFCGHETEELTLAQEMGVAYHNWGFDQATISGALTLANKYANIY 


97 


Query: 


84 


STIGWHPTEAGSYDDNIESMIISHLENPKVIALGEIGLDYYWMEDPKDIQIEVFKRQIEL 


143 






+TIGWHPTEAGSY + +E I+S L + KVIALGEIGLDYYWMEDPK++QIEVFKRQ++L 




Sb j ct : 


98 


ATIGWHPTFAGSYSEAVEEAIVSQLSHSKVTALGEIGLDYYWMEDPKEVQIEVFKRQMQL 


157 


Query: 


144 


SKEYNLPFWHTRDALEDTYEVIKESGVGPFGGIMHSFSGSLEMAQKFIDLGMMISFSGV 


203 






+K+++LPFWHTRDALEDTYEVIK +GVGP GGIMHS+SGSLEMA++FI+LGMMISFSGV 




Sbjct: 


158 


AKDHDLPFVVHTRDALEDTYEVIKAAGVGPRGGI^IHSYSGSLE^IAERFIELGMMISFSGV 


217 


Query: 


204 


OTFKKALDVQEAARELPLDKILVETDAPYLAPvPKRGRENKTAYTRYVVEKIAELRGITV 


263 






VTFKKALD+QEAA+ LPLDKILVETDAPYL PVPKRG++N TAYTRYW+KIAELRG+TV 




Sb j ct : 


218 


OTFKKALDIQEAAQHLPLDKILWTDAPYLTPVPKRGKQNHTAYTRYVVDKIAELRGMTV 


277 
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Query: 264 EEVSEATYQNAWIFRLD 281 

EEVA+AT NA R+F+LD 
Sbjct: 278 EEVAKATTANAKRVFKLD 295 

5 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1843 

A DNA sequence (GBSxl950) was identified in S.agalactiae <SEQ ID 5725> which encodes the amino 
acid sequence <SEQ ID 5726>. This protein is predicted to be endosome-associated protein. Analysis of 
10 this protein sequence reveals the following: 

Possible site: 31 

>>> Seems to have no N-terminal signal sequence 

Final Results 

15 bacterial cytoplasm Certainty=0. 5142 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

20 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1844 

A DNA sequence (GBSxl951) was identified in S.agalactiae <SEQ ID 5727> which encodes the amino 
25 acid sequence <SEQ ID 5728>. This protein is predicted to be CGI 7785 gene product. Analysis of this 
protein sequence reveals the following: 

Possible site: 14 

>>> Seems to have no N-terminal signal sequence 

30 Final Results 

bacterial cytoplasm Certainty=0. 4730 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1845 

40 A DNA sequence (GBSxl952) was identified in S.agalactiae <SEQ ID 5729> which encodes the amino 
acid sequence <SEQ ID 5730>. Analysis of this protein sequence reveals the following: 

possible site: 45 

>>> Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0. 4032 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB01041 GB:AB022220 gene_id:MLN21 . 14~unknown protein 
5 [Arabidopsis thaliana] 

Identities = 49/185 (26%) , Positives = 85/185 (45%) , Gaps » 46/185 (24%) 

Query: 5 LTDLDRWIAKQEYELGSQLDTLVKIMSQDKVLPIGKVAHVQ DGGKETGEQIYT 58 

L +D V+ + + ELGS+ + +M+ K+ V+ D K+ Q++ 

10 Sbjct: 154 LEGIDSVDSGRVKIELGSRGLMDLCVMASKLAYENAI04NLVEFLDCWNDYQKQMSTQVFV 213 

Query: 59 ITPNGTLDKPEDVKEVTVLFKGSTAPFGGDDWKTD WFKNDIPIASKL LLKKFG 111 

T DK +D + + F+G T PF DDW TD W+ ++P KL L+ G 

Sbjct: 214 FT DKQKDANLIVISFRG-TEPFDADDWGTDFDYSWY--EVPNVGKLHMGFLEAMG 265 

15 

Query: 112 SQSVSHKQGTKQ LEQSAH LLKE VMNKYPNAKI S VY 146 

Q+ S ++ +K+ +E+SA+ +LK +++++ NA+ V 

Sbjct: 266 LGNRDDTTTFHYNLFEQTSSEEENSKKNLLDMVERSAYYAVRVILKRLLSEHENARFVVT 325 

20 Query: 147 GHSLG 151 

GHSLG 
Sbjct: 326 GHSLG 330 

No corresponding DNA sequence was identified in S.pyogenes. 

25 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1846 

A DNA sequence (GBSxl953) was identified in S.agalactiae <SEQ ID 5731> which encodes the amino 

acid sequence <SEQ ID 5732>. Analysis of this protein sequence reveals the following: 

30 Possible site: 52 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -8.97 Transmembrane 12 - 28 ( 5 - 33) 

Final Results 

35 bacterial membrane Certainty=0 . 4588 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10141> which encodes amino acid sequence <SEQ ID 
40 101 42> was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

45 A related GBS gene <SEQ ID 8909> and protein <SEQ ID 8910> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 4 
McG: Discrim Score: 14.01 
GvH: Signal Score (-7.5): -5.55 
50 Possible site: 46 

»> Seems to have an uncleavable N-term signal seq 
ALOM program count: 1 value: -8.97 threshold: 0.0 

INTEGRAL Likelihood = -8.97 Transmembrane 6- 22 ( 1- 27) 
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25 



PERIPHERAL Likelihood =9.49 84 
modified ALOM score: 2.29 

*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0. 4588 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

10 SEQ ID 8910 (GBS32) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 10 (lane 2; MW 15.6kDa). 

GBS32-His was purified as shown in Figure 191, lane 8. 
Example 1847 

A DNA sequence (GBSxl954) was identified in S.agalactiae <SEQ ID 5733> which encodes the amino 
15 acid sequence <SEQ ID 5734>. This protein is predicted to be extramembranal protein (dltD). Analysis of 
this protein sequence reveals the following: 

Possible site: 31 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-10.24 Transmembrane 12 - 28 ( 4 - 31) 

20 

Final Results 

bacterial membrane Certainty=0 . 5097 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC29041 GB:AF050517 unknown [Streptococcus mutans] 
Identities = 242/421 (57%) , Positives = 309/421 (72%) , Gaps = 1/421 (0%) 

'30 Query: 1 MLKRLGKVFGPLVCALLLLVGLYFVFPVSQ - PHHLGKEKNSAVALTKAGFKSRVQKVRAF 59 

MLKRL + GP+ CAL+L+ L +P H+ +EKN AVAL+ + FKS +K+RA 

Sbjct: 1 MLKRLWLILGPVFCALVLVFSLIMFYPAKHLSHNYNEEKNDAVALSPSSFKSTNKKMRAL 60 

Query: 60 SDPKANFVPFFGSSEWLRFDAMHPSVLAEAYNRSYIPYLLGQKGAASLTQYYGIQQIKGQ 119 
35 SD + FVPFFGSSEW R D MHPSVLAE YNRSY PYLLGQKG+ SL+ Y+G+QQI Q 

Sbjct: 61 SDKRHLFVPFFGSSEWQRIDNMHPSVLAERYNRSYRPYLLGQKGSTSLSHYFGMQQIGNQ 120 

Query: 120 IKNKKAIYVISPQWFVRKGANKGAFQNYFSNDQTIRFLQNQTGTTYDRYAARRLLKLYPE 179 
IKNKKA+YVISPQWFV KG + AFQ YFS++Q FL NQTG+T DRYAA+RLL + P 
40 Sbjct: 121 IKNKKAVYVISPQWFVPKGTSPIAFQQYFSSEQLADFLLNQTGSTADRYAAKRLLDIKPS 180 

Query: 180 ASMSDLIEKVADGQKLSNKDKQRLKFNDWVFEKTDAIFSYLPLGKTYNQAIMPHVGKLPK 239 

+++ +I+K+A G+ L++ D+ L+ +K DA+F L Y + ++PHV KLPK 

Sbjct: 181 SNLQGMIKKIAAGKTmSFDFASLRLIKSFLKKEDALFGSLTFSDNYERRVLPHVKKLPK 240 

45 

Query: 240 AFSYNHLSRIASQDAKVATRSNQFGIDDRFYQTRIKKHLKKLKGSQRHFNYTKSPEFNDL 299 

FSY LS+IAS+D + T++NQF I+D FY RIK LK+LKG Q+ +Y +SPE+NDL 
Sbjct: 241 HFSYGTLSQIASKDGQRLTKTNQFEINDHFYNKRIKGQLKRLKGFQKQLSYLQSPEYNDL 300 

50 Query: 300 QLvLNEFSKQNTDVLFVIPPVNKKWTDYTGLDQKMYQKSWKIKHQLQSQGFNHIADLSR 359 

QL L + +K T V+FVIPPVN KW +YTGL Q MYQK+VEKIK+QLQSQGF++IADLS+ 
Sbjct: 301 QLALTQLAKSKTKVIFVIPPVNAKWVEYTGLSQIMYQKTVEKIKYQLQSQGFDNIADLSK 360 

Query: 360 DGGKPYFMQDTIHLGWNGWLELDKHINPFLTEENSKPNYHINNKFLKKSWAKYTGRPSDYK 420 
55 +G +PYFMQDTIHLGWNGWL DK +NPFL+++ +P Y INN FL K WA YTG P +K 

Sbjct: 361 NGDQPYFMQDTIHLGWNGWLAFDKEVNPFLSKKQLQPAYKINNHFLSKKWATYTGNPFQFK 421 



A related DNA sequence was identified in S.pyogenes <SEQ ID 5735> which encodes the amino acid 
sequence <SEQ ID 5736>. Analysis of this protein sequence reveals the following: 



WO 02/34771 



PCT/GB01/04789 



20 



25 



35 



40 



-2078- 

Possible site: 41 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-13.06 Transmembrane 7 - 23 ( 1-31) 



5 Final Results 

bacterial membrane Certainty=0 . 6222 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

10 An alignment of the GAS and GBS proteins is shown below. 

Identities = 209/410 (50%) , Positives = 278/410 (66%) 

MLKRLGKVFGPLVCALLLLVGLYFVFPVSQPHHLGKEKNSAVALTKAGFKSRVQKVRAFS 6 0 
MLKRL + GPL+ A +L+V F FP H + +EK +AVA+T + FK+ + K +A S 
15 Sbjct: 1 MLKRLWLILGPLLIAFVLWITIFSFPTQLDHSIAQEKANAVAITDSSFKNGLIKRQALS 60 

DPKANFVPFFGSSEWLRFDAMHPSVLAEAYNRSYIPYLLGQKGAASLTQYYGIQQIKGQI 120 
D FVPFFGSSEW R D+MHPSVLAE Y RSY P+L+G++G+ASL+ YYGIQQI ++ 
DETCRFVPFFGSSEWSRMDSMHPSVLAERYKRSYRPFLIGKRGSASLSHYYGIQQITNEM 120 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sb j ct : 


61 


Query: 


121 


Sbjct: 


121 


Query: 


181 


Sb j ct : 


181 


Query: 


241 


Sb j ct : 


241 


Query: 


301 


Sb j ct : 


301 


Query: 


361 


Sb j ct : 


361 



+ KKAI+V+SPQWF +G N A Q Y SN Q I FL ++AA+RLL+L P 



S S+L++KV+ G+ LS D+ LK V + +++FS+L Y + I+P V LPK 



30 FSY L+ +A++ ++AT +N+FGI + FY+ RI K Q +++Y SPE+ND Q 



L+L+EF+K+ TDVLFVI PVNK W DYTGL+Q YQ +V KIK QL+SQGF+ IAD S+D 



GG+ YFMQDTIHLGWNGWL DK + PFL + PNY +N F KM 
GGESYFMQDTIHLGWNGWLAFDKKVQPFLETKQPVPNYKMNPYFYSKIWA 410 

A related GBS gene <SEQ ID 891 1> and protein <SEQ ID 8912> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 6 
McG: Discrim Score: 15.50 
45 GvH: Signal Score (-7.5): -4.52 

Possible site: 31 
>>> Seems to have an uncleavable N-term signal seq 
ALOM program count: 1 value: -10.24 threshold: 0.0 

INTEGRAL Likelihood =-10.24 Transmembrane 12 - 28 ( 4 - 31) 
50 PERIPHERAL Likelihood = 8.33 301 

modified ALOM score: 2.55 

*** Reasoning Step: 3 

55 Final Results 

bacterial membrane Certainty=0. 5097 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

60 The protein has homology with the following sequences in the databases: 

57.5/76.3% over 420aa 

Streptococcus mutans 

GPl 3403204 I unknown Insert characterized 
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ORF00336 (301 - 1560 of 1860) 

GP|3403204|gb|AAC29041.l| |AF050517(1 - 421 of 421) unknown {Streptococcus mutans} 
%Match =41.0 

% Identity = 57.5 %Similarity = 76.2 
5 Matches = 242 Mismatches = 99 Conservative Sub.s = 79 

33 63 93 123 153 183 213 243 

FSGFLDLLWFPQPHNK**GVL*WILNQKY*QLLMTYLWRMFLL*WMKTYLTQEF*TAWVLIiN*LLSWKATLILIFRLRm 

10 273 303 333 363 420 450 480 

VVMTGTQLIKLLLE*RSSAMLKRLGKVFGPLVCALLLLVGLYFVFPVSQ-PHHLGKEKNSAVALTKAGFKSRVQKVRAFS 



15 



45 



MLKRLWLILGPVFCALVLVFSLIMFYPAKHLSHNYNEEKNDAVALSPSSFKSTNKKMRALS 
10 20 30 40 50 60 



510 540 570 600 630 660 690 720 

DPKANFVPFFGSSEWLRFDAMHPSVLAFAYNRSYIPYLLGQKGAASLTQYYGIQQIKGQIKNKKAIYVISPQWFVRKGAN 

i : iii-iiiim i i mum iiiii mmm immmi niiiimiiiiim n = 

DKRHLWPFFGSSEWQRIDNMHPSVLAERYNRSYRPYLLGQKGSTSLSHYFGMQQIGNQIKNKKAVYVISPQWFVPKGTS 
20 80 90 100 110 120 130 140 

750 780 810 840 870 900 930 960 

KGAFQlSTYFSraDQTIRFLQNQTGTTYDRYAARRLLKLYPEASMSDLIEK^ADGQKLSNKDKQRLKFNDWVFEKTDAIFSYL 

ill mm ii nihi mimn = i = = = mm i= m h m m im i 

25 PIAFQQYFSSEQLADFLLNQTGSTADRYAAKRLLDIKPSSNLQGMIKKIAAGKTLNSFDRASLRLIKSFLKKEDALFGSL 

160 170 180 190 200 210 220 

990 1020 1050 1080 1110 1140 1170 1200 

PLGKTYNQAIMPHVGKLPKAFSYIfflLSRIASQDAKVATRSNQFGIDDRFYQTRIKiaiLKKLKGSQRHFNYTKSPEFNDLQ 
30 : | : ::||| |||| ||| ||:|||;| : |:=||| |:| || ||| :||:||| |::::| 

TFSDNYERRVLPHVKKLPKHFSYGTLSQIASKTX-KJRLTKINQFEI^ 

240 250 260 270 280 290 300 

1230 1260 1290 1320 1350 1380 1410 1440 

35 LVLNEFSKQNTDVIjFVIPPWKKWTDYTGLDQKMYQKS^ 

i i ==:i i mmm n mn i mmiiimiiiiimimmi mmimimm 

ialtqlaksktkvifvipptoakwveytglsqdmyqktvekikyqlqsqgfdniadlskngdqpyfmqdtihlg™gw^ 

320 330 340 350 360 370 380 

40 1470 1500 1530 1560 1590 1620 1650 1680 

ldkhinpflteenskpnyhinnkflkkswakytgrpsdyk*ivesddl*h*sy*ssflislylvilr*lihvl*ffiyne 



fdkevnpflskkqlqpaykinnhflskkwatytgnpfqfk 

400 410 420 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1848 

A DNA sequence (GBSxl955) was identified in S.agalactiae <SEQ ID 5737> which encodes the amino 
50 acid sequence <SEQ ID 5738>. This protein is predicted to be d-alanyl carrier protein (dltC). Analysis of 
this protein sequence reveals the following: 

Possible site: 21 

>» Seems to have no N-terminal signal sequence 

55 Final Results 

bacterial cytoplasm Certainty^O. 1061 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



60 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC05776 GB:AF051356 D-alanyl carrier protein [Streptococcus mutans] 
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Identlties = 65/79 (82%) , Positives = 74/79 (93%) 

Query: 1 MDIKSEVLAIIDDLFMEDVSSMMDEDLFDAGVLDSMGTVELIVELESHFNIDIPIAEFGR 60 

MDIKSEVL IID+LFMEDVS MMDEDLFDAGVLDSMGTVELIVELE+HF+I +P++EFGR 
Sbjct: 1 MDIKSEVLKIIDELFMEDVSDMMDEDLFDAGVLDSMGTVELIVELENHFDITVPVSEFGR 60 

Query: 61 NDWNTANKIVAGVTELCNA 79 

+DWNTANKI+ G+TEL NA 
Sbjct: 61 DDWNTANKI IEGITELRNA 79 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5739> which encodes the amino acid 
sequence <SEQ ID 5740>. Analysis of this protein sequence reveals the following: 

Possible site: 38 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3976 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 57/79 (72%) , Positives = 65/79 (82%) 

Query: 1 ^IKSEVrAIIDDLFMEDVSSMMDEDLFDAGVLDSMGTVELIVELESHFNIDIPIAEFGR 60 

M 1+ V+ + D LFMEDVS MMDEDLFDAGVLDS+GTVEL IVELES FNI +PI+EFGR 
Sbjct: 1 MSIEETVIELFDRLFMEDVSEMMDEDLFDAGVLDSLGTVELIVELESTFNIKVPISEFGR 60 

Query: 61 NDWNTANKIVAGVTELCNA 79 

+DWNT KIV GV EL +A 
Sbjct: 61 DDWNTVTKIVQGVEELQHA 79 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1849 

A DNA sequence (GBSxl956) was identified in S.agalactiae <SEQ ID 5741> which encodes the amino 
acid sequence <SEQ ID 5742>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

>>> Seems to have an uncleavable N-term signal seq 
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Final Results 

bacterial membrane Certainty=0. 4418 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5743> which encodes the amino acid 
sequence <SEQ ID 5744>. Analysis of this protein sequence reveals the following: 

Possible site: 57 
»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-10.14 Transmembrane 387 - 403 ( 382 - 409) 
INTEGRAL Likelihood = -9.66 Transmembrane 18 - 34 ( 15 - 37) 
INTEGRAL Likelihood = -5.95 Transmembrane 64 - 80 ( 63 - 81) 
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INTEGRAL Likelihood = -5.63 Transmembrane 92 - 108 ( 89 - 114) 
INTEGRAL Likelihood = -1.97 Transmembrane 40 - 56 ( 40 - 56) 



Final Results - ; 

bacterial membrane Certainty^O. 5055 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAC05775 GB:AF051356 integral membrane protein [Streptococcus mutans] 
Identities = 246/413 (59%) , Positives = 319/413 (76%) 

Query: 1 MMMFFSHIPYMEPYGNPIYFWLILAFLPVIIGIFKQKRLSTYETLVSLVFILFMFGGDH 60 

M+ FF ++P++E YGNP YF Y+ILA LP+ IG+F +KR YE VSL+FI+ M G+ 
Sbjct: 1 MIDFFKNLPHLEAYGNPQYFFYIILAVLPIFIGLFFKKRFPLYEAFVSLIFIVLMLTGEK 60 

Query: 61 YQQLVAFLFYLLWQI I SVFAYQKYRENANSAGVFYLAIAMALFPLIWVKVAPLTGPSSQT 120 

Q+ A FY++WQI V++Y+ YR++ ++ +FYL + M++ PL VK+ P + Q+ 
Sbjct: 61 SHQIFALFFYIIWQIFCVYSYKFYRKSRDNKWIFYLHVFMSILPLSLVKITPAIWTNQQS 120 

Query: 121 LFSFLGISYLTFKSIGMIIEMRDGTLQEVRLPDFIRFMIFFPTFSSGPIDRFRHFQEDYH 180 

LF FLGI SYLTF+S+GMI +EMRDG L +FIRFM+F PTFSSGPIDRFR F +DY 

Sbjct: 121 LFGFLGISYLTFRSVGMIMEMRDGVLTSFTFWEFIRFMLFMPTFSSGPIDRFRRFNDDYE 180 

Query: 181 KLPERDDYFAMLNKAVMYLMLGFLYKHIISYCLGGILLPLLENKALMVGGYFNKETILVM 240 

K+P++D+ ML ++V Y+MLGF YK +++ LG ++LP L+ AL GG+FN T+ VM 
Sbjct: 181 KIPDKDELLDMLEQSVHYIMLGFFYKFVLAQILGTMILPGLKEMALQKGGWFNWPTLGVM 240 

Query: 241 YVYGLNLFFDFAGYSMFAIGISYLLGIRTPENFNMPFLSASLKDFWNRWHMSLSFWFRDY 300 

YVYGL+LFFDFAGYSMFAI IS +GI++P NFN PF S LK+FWNRWHMSLSFWFRD+ 
Sbjct: 241 YVYGLDLFFDFAGYSMFAIAISNFMGIKSPTNFNQPFKSQDLKEFWNRWHMSLSFWFRDF 300 

Query: 301 VFMRLVHLLIKHKTFKNRNVTSGVAYLVNMLVMGFWHGLTWYYIAYGLFHGIGLIINDAW 360 

VFMRLV +L+K+K FKNRNVTS VAY+VNML+MGFWHG+TWYYI YGLFHG+GL++NDAW 
Sbjct: 301 VFMRLVKVLVraKVFKNRNVTSSVAYIVNMLIMGFWHGVTWYYITYGLFHGVGL 360 

Query: 361 IRKKKEINRHRKKKGLSPLFQSRAFHVLCIVVTFHVVMFSLLLFSGFLNDLWF 413 

+RKKK +N+ RK K LSPL ++ L IV+TF+WM S L+FSGFLNDLWF 

Sbjct: 361 LRKKKRliNKERKAKNLSPLPENGWTRALGIVITFNVVMLSFLIFSGFLNDLWF 413 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 240/416 (57%) , Positives = 317/416 (75%) , Gaps = 5/416 (1%) 

Query: 5 FLEKLPHLDVYGNPQYFFYLILAVLPIYIGLFFKKRFALYEIIFSLSFIVMMLTGSTFNQ 64 

F +P+++ YGNP YF YLILA LP+ IG+F +KR + YE + SL FI+ M G + Q 
Sbjct: 4 FFSHIPYMEPYGNPIYFVYLIIiAFLPVIIGIFKQKRLSTYETLVSLVFILFMFGGDHYQQ 63 

Query: 65 LKSLLAYWGQSLLVFIYKAYRKRFNHTLVFYVTVCLSIFPLFLVKLIPAISEDGHQSLF 124 

L + L Y++ Q + VF Y+ YR+ N VFY+ + +++FPL VK+ P ++ Q+LF 
Sbjct: 64 LVAFLFYLLWQIISVFAYQKYRENANSAGVFYLAIAMALFPLIWVKVAP-LTGPSSQTLF 122 

Query: 125 GFLGISYLTFRAVAMIIEMRDGVLKEFTLWEFLRFLLFFPTFSSGPIDRFKRFNEDYINI 184 

FLGISYLTF+++ MIIEMRDG L+E L +F+RF++FFPTFSSGPIDRF+ F EDY + 
Sbjct: 123 SFLGISYLTFKSIGMIIEMRDGTLQEVRLPDFIRFMIFFPTFSSGPIDRFRHFQEDYHKL 182 

Query: 185 PDRNELLDMLGQAIHYLMLGFLYKFILAYIFGSLIMPPLKELALEQGGVFNWPTLGVMYA 244 

P+R++ ML +A+ YLMLGFLYK I++Y G +++P L+ AL GG FN T+ VMY 
Sbjct: 183 PERDDYFAMIiNKAvMYLMLGFLYKHIISYCLGGILLPLLENKALMVGGYFNKETILVMYV 242 

Query: 245 FGFDLFFDFAGYTMFALAISNLMGIKSPINFDKPFKSRDLKEFWNRWHMSLSFWFRDFVF 304 

+G +LFFDFAGY+MFA+ IS L+GI++P NF+ PF S LK+FWNRWHMSLSFWFRD+VF 
Sbjct: 243 YGLNLFFDFAGYSMFAIGISYLLGIRTPF^FNMPFLSASLKDFWNRWHMSLSFWFRDYVF 302 

Query: 305 MRLVKLLVKNKVFKNRNVTSSVAYIINMLLMGFWHGLTWYYIAYGLFHGIGLVINDAWVR 364 

MRLV LL+K+K FKNRNVTS VAY++NML+MGFWHGLTWYYIAYGLFHGIGL+INDAW+R 
Sbjct: 303 MRLVHLLIKHKTFKNRNWSGVAYLVNMLVMGFWHGLTWYYIAYGLFHGIGLIINDAWIR 362 
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' Query: 365 KKKNINKERRLAKKPLLP- -ENKWTYALGVFITFNWMFSFLIFSGFLDLLWFPQP 418 
KKK IN+ R+ KK L P +++ + L + +TF+WMFS L+FSGFL+ LWF +P 
Sbjct: 363 KKKEINRHRK- -KKGLSPLFQSRAFHVLCIVVTFHVvMFSLLLFSGFIjNDLWFNRP 416 

A related GBS gene <SEQ ID 8913> and protein <SEQ ID 8914> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
McG: Discrim Score: 3.22 
GvH: Signal Score (-7.5): -4.56 

Possible site: 16 
>>> Seems to have an uncleavable N-term signal seq 
ALOM program count: 7 value: -8.55 threshold: 0.0 



INTEGRAL 


Likelihood 




-8. 


.55 


Transmembrane 


93 - 


109 


( 


91 - 


117) 


INTEGRAL 


Likelihood 




-7. 


.64 


Transmembrane 


21 - 


37 


( 


19 - 


39) 


INTEGRAL 


Likelihood 




-6. 


,79 


Transmembrane 


390 - 


406 


( 


387 - 


410) 


INTEGRAL 


Likelihood 




-5. 


.20 


Transmembrane 


41 - 


57 


( 


40 - 


59) 


INTEGRAL 


Likelihood 




-2, 


.07 


Transmembrane 


203 - 


219 


( 


200 - 


221) 


INTEGRAL 


Likelihood 




-1. 


.65 


Transmembrane 


65 - 


81 


( 


65 - 


81) 


INTEGRAL 


Likelihood 




-0. 


.75 


Transmembrane 


125 - 


141 


( 


125 - 


141) 


PERIPHERAL 


Likelihood 




1. 


,01 


322 













modified ALOM score: 2.21 



*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 4418 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The, protein has homology with the following sequences in the databases: 

ORF01206(313 - 1563 of 1863) 

GP|2952530|gb|AAC05775.l| |AF051356(4 - 419 of 420) integral membrane protein {Streptococcus 
mutans } 
%Match =50.3 

%Identity = 71.0 %Similarity =86.6 . 
Matches = 296 Mismatches = 55 Conservative Sub.s = 65 

273 303 333 363 393 423 453 483 

TFDTKWEN*YQRSYERGKQVIQAFLEKLPHLDVYGNPQYFFYLILAVLPIYIGLFFKKRFALYEIIFSLSFIVMMLTGST 

h= UN: llllllllhllllllhlllllllll III II llhllll 
MIDFFKNLPHLEAYGNPQYFFYIILAVLPIFIGLFFKKRFPLYEAFVSLIFIVLMLTGEK 
10 20 30 40 50 60 

513 543 573 603 633 663 693 723 

FNQLKSLLAYWGQSLLVFIYKAYRKRFNHTLVFYVTVCLSIFPLFLVKLIPAISEDGHQSLFGFLGISYLTFRAVAMII 

, :|= :|= l = = I = 1= II III =: =11= I :||:|| llh III = = I I I I I I I I I I I I I I I = I Ih 
SHQIFALFFYIIWQIFOTYSYKFYRKSRDNKWIFYLHVFMSILPLSLVKITPAIWTN-QQSLFGFLGISYLTFRSVGMIM 
70 80 90 100 110 120 130 

753 783 813 843 873 903 933 963 

EMRDGVLKEFTLWEFLRFLLFFPTFSSGPIDRFKRFNEDYINIPDRNELLDMLGQAIHYLMLGFLYKFILAYIFGSLIMP 

lllllll Ihllhlhll llllllllllhllhll llh:||lll! h:|hlllhllhll |:| = = hl 
EMRDGVLTSFTFWEFIRFMLFMPTFSSGPIDRFRRFNDDYEKIPDKDELLDMLEQSVHYIMLGFFYKFVLAQILGTMILP 
150 160 170 180 190 200 210 

993 1023 1053 1083 1113 1143 1173 1203 

PLKELALEQGGVFNWPTLGVMYAFGFDLFFDFAGYTMFAIAISMjMGIKSPIN^ 

llhl|::|l llllllllll = h I I I I I I I I h I I h I I I h I I I I I I I h = I I I h I I I I I I I I I I I I I I I I I I I 
GLKEMALQKGGWFNWPTLGVMYVYGLDLFFDFAGYSMFAIAISNFMGIKSPTNFNQPFKSQDLKEFWNRWHMSLSFWFRD 
230 240 250 260 270 280 290 

1233 1263 1293 1323 1353 1383 1413 1443 

FVFMRLVKLLVKNKVFKNRNVTSSVAYIINMLLMGFWHGLTWTC 

lllllllhlllllllllllimillhllhlllllhlllll lllllhllhlllhllll :||lh I 
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FVFMRLvTCVLVTOJKVFKNRNVTS^ 

310 320 330 340 350 360 370 

1473 1503 1533 1563 1593 1623 1653 1683 

5 PENKWTYALGVFITFNVvMFSFLIFSGFLDLLWFPQPHNK**GVIi*^ 

ill II llh llllllhllllllllh III :| 
PENGWTRALGIVITFNWMLSFLIFSGFIiNDLWFADQLSKK 
390 400 410 420 

10 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1850 

A DNA sequence (GBSxl957) was identified in S.agalactiae <SEQ ID 5745> which encodes the amino 

acid sequence <SEQ ID 5746>. Analysis of this protein sequence reveals the following: 

15 Possible site: 45 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2611 (Affirmative) < suco 

20 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10143> which encodes amino acid sequence <SEQ ID 
10144> was also identified. 

25 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC05774 GB:AF051356 D-alanine-D-alanyl carrier protein ligase 
[Streptococcus mutans] 
Identities = 404/510 (79%) , Positives = 465/510 (90%) 

30 Query: 5 IHDMIKTIEHFAETQADFPvYDILGEVHTYGQLKVDSDSLAAHIDSLGLVEKSPVLVFGG 64 

I DMI TIE+FA+ QA+FPVY+ILGE+HTYG+LK DSDSLAAH+D L L KSPV+VFGG 
Sbjct: 6 IKDMIATIENFAQEQAEFPVYNILGEIHTYGELKADSDSLAAHLDQLDLTAKSPWVFGG 65 

Query: 65 QEYEMIATFVALTKSGHAYIPTOQHSALDRIQAIMTVAQPSLIISIGEFPLEVDNVPILD 124 
35 QEY MLA+FVALTKSGHAYIP+D HSAL+RI +AI + VA+PSL+I++ +FP++ VP++ 

Sbjct: 66 QEYAMLASFVALTKSGHAYIPIDHHSALERIEAILEVAEPSLVIAVDDFPIDNLQVPVIQ 125 

Query: 125 VSQVSAI FEEKTPYE VTHSVKGDDNYYI I FTSGTTGLPKGVQISHDNLLSFTNWMI SDDE 184 
SQ+ IF++K Y++ H+VKGDD YYIIFTSGTTG PKGVQISHDNLLSFTNWMI+ + 
40 Sbjct: 126 YSQLEEIFKQKLSYQINHAVKGDDTYYIIFTSGTTGKPKGVQISHDNLLSFTNWMINAEA 185 



45 



Query: 185 FSVPERPQMLAQPPYSFDLSVMYWAPTI^GGTLFALPKTvVNDFKKLFATINELPIQVW 244 

F+ P RPQMLAQPPYSFDLSVMYWAPTLA+GGTLFALPK + DFK+LF TIN+LPI VW 
Sbjct: 186 FATPHRPQMIAQPPYSFDLSvMYWAPTLALGGTLFALPKEITADFKQLFTTINQLPIGVW 245 

Query: 245 TSTPSFADMALLSNDFNSETLPQLTHEYFIX3EELTVKTAQKLRQRFPKARIVNAYGPTEA 304 

TSTPSF DMA+LS+DFN++ LP LTHFYFDGEELTVKTA+KLRQRFP+ARIVNAYGPTEA 
Sbjct: 246 TSTPSFVDMAMLSDDFNAQQLPHLTHFYFDGEELTVKTAKKLRQRFPQARIVNAYGPTEA 305 

50 Query: 305 TVALSAVAITDEMLETCKRLPIGYTKDDSPTYVIDEEGHKLPNGEQGEIIIAGPAVSKGY 364 

TVALSA+A+TD+MLETCKRLPIGYTK DSPT++IDE GHKL NG+QGEII++GPAVSKGY 
Sbjct: 306 TVALSALAVTDKMLETCKRLPIGYTKPDSPTFIIDESGHKIANGQQGEIIVSGPAVSKGY 365 

Query: 365 LNNPEKTAEAFFQFEGLPAYHTGDLGSMTDEGLLLYGGRMDFQIKFNGYRIELEDVSQNL 424 
55 LNNPE+TA AFF+FEGLPAYHTGDLGSMTDEGLLLYGGRMDFQIKFNGYRIELE+VSQNL 

Sbjct: 366 LNNPERTAAAFFEFEGLPAYHTGDLGSMTDEGLLLYGGRMDFQIKFNGYRIELEEVSQNL 425 

Query: 425 NKSQYVKSAVAVPRYNKDHKVQl^IAYIVLKEGvRDDFERDLDLTKAIKEDLKDIMMDYM 484 
NKSQY+ SAVAVPRYNKDHKVQNLLAY+VLK+GV + FER LD+TKAIK DL+D+MMDYM 
60 Sbjct: 426 NKSQYIASAVAVPRYNKDHKVQNLIAYVVLKDGVEEQFERALDITKAIKADLQDVMMDYM 485 
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Query: 485 MPSKFIYREDLPLTPNGKIDIKGLMSEVNK 514 

MPSKF+YR+DLPLTPNGKIDIKGLMSEVNK 
Sbjct: 486 MPSKFLYRKDLPLTPNGKIDIKGLMSEVNK 515 

5 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5747> which encodes the amino acid 
sequence <SEQ ID 5748>. Analysis of this protein sequence reveals the following: 

Possible site: 60 
>» Seems to have no N-terminal signal sequence 
10 INTEGRAL Likelihood = -2.28 Transmembrane 92 - 108 ( 91 - 108) 

INTEGRAL Likelihood = -0.85 Transmembrane 43 - 59 ( 41 - 59) 

Final Results 

bacterial membrane Certainty=0 . 1914 (Affirmative) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAC05774 GB:AF051356 D-alanine-D-alanyl carrier protein ligase 
20 [Streptococcus mutans] 

Identities = 365/511 (71%) , Positives = 438/511 (85%) 

Query: 2 IKDMIDSIEQFAQTQADFPVYDCLGERRTYGQLKRDSDSIAAFIDSLALLAKSPVLVFGA 61 
IKDMI +IE FAQ QA+FPVY+ LGE TYG+LK DSDS+AA +D L L AKSPV+VFG 
25 Sbjct: 6 IKDMIATIENFAQEQAEFPvYNILGEIHTYGELKADSDSLAAHLDQLDLTAKSPVWFGG 65 

Query: 62 QTYDMLATFVALTKSGHAYIPVDVHSAPERILAI IEIAKPSLI I AI EEFPLTIEGI SLVS 121 

Q Y MLA+FVALTKSGHAYIP+D HSA ERI AI+E+A+PSL+IA+++FP+ + ++ 
Sbjct: 66 QEYAMLASFVALTKSGHAYIPIDHHSALERIEAILEVAEPSLVIAVDDFPIDNLQVPVIQ 125 

30 

Query: 122 LSEIESAKLAEMPYERTHSVKGDDNYYIIFTSGTTGQPKGVQISHDNLLSFTNWMIEDAA 181 

S++E ++ Y+ H+VKGDD YYIIFTSGTTG+PKGVQISHDNLLSFTNWMI A 

Sbjct: 126 YSQLEEI FKQKLSYQINHAVKGDDTYYI I FTSGTTGKPKGVQI SHDNLLSFTNWMINAEA 185 

35 Query: 182 FDVPKQPQMLAQPPYSFDLSVMYWAPTLALGGTLFALPKELVADFKQLFTTIAQLPVGIW 241 

F P +PQMLAQPPYSFDLSVMYWAPTLALGGTLFALPKE+ ADFKQLFTTI QLP+G+W 
Sbjct: 186 FATPHRPQMLAQPPYSFDLSVMYWAPTLALGGTLFALPKEITADFKQLFTTINQLPIGvW 245 

Query: 242 TSTPSFADMAMLSDDFCQAKMPALTHFYFDGEELTVSTARKLFERFPSAKIINAYGPTEA 301 
40 TSTPSF DMAMLSDDF ++P LTHFYFDGEELTV TA+KL +RFP A+ 1 +NAYGPTEA 

Sbjct: 246 TSTPSFVDMAMLSDDFNAQQLPHLTHFYFDGEELTVKTAKKLRQRFPQARIVNAYGPTEA 305 

Query: 302 TVALSAIEITREMVDNYTRLPIGYPKPDSPTYIIDEDGKELSSGEQGEII VTGPAVSKGY 361 
TVALSA+ +T +M++ RLPIGY KPDSPT+IIDE G +L++G+QGEIIV+GPAVSKGY 
45 Sbjct: 306 TVALSALAVTDKMIiETCKRLPIGYTKPDSPTFI IDESGHKLANGQQGEI IVSGPAVSKGY 365 

Query: 362 LNNPEKTAEAFFTFKGQPAYHTGDIGSLTEDNILLYGGRLDFQIKYAGYRIELEDVSQQL 421 

LNNPE+TA AFF F+G PAYHTGD+GS+T++ +LLYGGR+DFQIK+ GYRIELE+VSQ L 
Sbjct: 366 LNNPERTAAAFFEFEGLPAYHTGDLGSMTDEGLLLYGGRMDFQIKFNGYRIELEEVSQNL 425 

50 

Query: 422 NQSPMVASAVAVPRYNKEHKVQKLLAYIWKDGVKERFDRELELTKAIKASVKDHMMSYM 481 

N+S +ASAVAVPRYNK+HKVQNLLAY+V+KDGV+E+F+R L++TKAIKA ++D MM YM 
Sbjct: 426 NKSQYIASAVAVPRYNKDHKVQNLLAYVVLKDGVEEQFERALDITKAIKADLQDVMMDYM 485 

55 Query: 482 MPSKFLYRDSLPLTPNGKIDIKTLINEVNNR 512 

MPSKFLYR LPLTPNGKIDIK L++EVN + 
Sbjct: 486 MPSKFLYRKDLPLTPNGKIDIKGLMSEVNKK 516 



An alignment of the GAS and GBS proteins is shown below. 

60 Identities = 374/510 (73%) , Positives = 439/510 (85%) 

Query: 4 MIHDMIKTIEHFAETQADFPVYDILGEVHTYGQLKVDSDSLAAHIDSLGLVEKSPVLVFG 63 

MI DMI +IE FA+TQADFPVYD LGE TYGQLK DSDS+AA IDSL L+ KSPVLVFG 
Sbjct: 1 MIKDMIDSIEQFAQTQADFPVYDCLGERRTYGQLKRDSDSIAAFIDSLALLAKSPVLVFG 60 
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Query: 


64 


GQEYEMIATFVALTKSGHAYIPVDQHSALDRIQAIMTVAQPSLIISIGEFPLEVDNVPIL 


123 






Q Y+MLATFVALTKSGHAYIPVD HSA +RI AI+ +A+PSLII+I EFPL ++ + ++ 




Sb j ct : 


61 


AQTYDMLATFVALTKSGHAYIPVDVHSAPERILAI1EIAKPSLIIAIEEFPLTIEGISLV 


120 


Query: 


124 


DVSQVSAI FEEKTPYEVTHSVKGDDNYYI IFTSGTTGLPKGVQI SHDNLLS FTNWMI SDD 


183 






+S++ + + PYE THS VKGDDNYYI I FTSGTTG PKGVQISHDNLLSFTNWMI D 




Sbj ct: 


121 


SLSEIESAKLAEMPYERTHSVKGDDNYYIIFTSGTTGQPKGVQISHDNLLSFTNWMIEDA 


180 


Query: 


184 


EFSVPERPQMLAQPPYSFDLSVMYWAPTIAMGGTLFALPKTVVNDFKKLFATINELPIQV 


243 






F VP++PQMLAQPPYSFDLSVMYWAPTLA+GGTLFALPK +V DFK+LF TI +LP+ + 




Sbj ct : 


181 


AFDVPKQPQMLAQPPYSFDLSVMYWAPTBALGGTLFALPKELVADFKQLFTTIAQLPVGI 


240 


Query: 


244 


WTSTPSFADMALLSNDFNSETLPQLTHFYFDGEELWKTAQKLRQRFPKARIVNAYGPTE 


303 






WTSTPSFADMA+LS+DF +P LTHFYFDGEELTV TA+KL +RFP A+I+NAYGPTE 




Sbj ct : 


241 


WTSTPSFADMAMLSDDFCQAKMPALTHFYFDGEELTVSTARKLFERFPSAKIINAYGPTE 


300 


Query: 


304 


ATVALSAVAITDEMLETCKRLPIGYTKDDSPTYVIDEEGHKLPNGEQGEIIIAGPAVSKG 


363 






ATVALSA+ IT EM++ RLPIGY K DSPTY+IDE+G +L +GEQGEII+ GPAVSKG 




Sbj ct : 


301 


ATVALSAIEITREMVDNYTRLPIGYPKPDSPTYIIDEDGKELSSGEQGEIIVTGPAVSKG 


360 


Query: 


364 


YLNNPEKTAEAFFQFEGLPAYHTGDLGSMTDEGLLLYGGRMDFQIKFNGYRIELEDVSQN 


423 






YLNNPEKTAEAFF F+G PAYHTGD+GS+T++ +LLYGGR+DFQIK+ GYRIELEDVSQ 




Sbj ct : 


361 


YLNNPEKTAEAFFTFKGQPAYHTGDIGSLTEDNILLYGGRLDFQIKYAGYRIELEDVSQQ 


420 


Query: 


424 


LNKSQWKSAVAVPRYNKDHK^QNLIAYIVLKEGVRDDFERDLDLTKAIKEDLKDIMMDY 


483 






LN+S V SAVAVPRYNK+HKVQNLLAYIV+K+GV++ F+R+L+LTKAIK +KD MM Y 




Sbj ct : 


421 


LNQSPMVASAVAVPRYNKEHKVQNLIiAYIWKDGVKERFDRELELTKAIKASVKDHMMSY 


480 


Query: 


484 


MMPSKFIYREDLPLTPNGKIDIKGLMSEVN 513 








MMPSKF+YR+ LPLTPNGKIDIK L++EVN 




Sbj ct : 


481 


MMPSKFLYRDSLPLTPNGKIDIKTLINEVN 510 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1851 

A DNA sequence (GBSxl958) was identified in S.agalactiae <SEQ ID 5749> which encodes the amino 
acid sequence <SEQ ID 575 0>. This protein is predicted to be a histidine protein kinase (phoR). Analysis of 
this protein sequence reveals the following: 

Possible site: 26 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-13.64 Transmembrane 9 - 25 ( 5-32) 
INTEGRAL Likelihood =-11.62 , Transmembrane 136 - 152 ( 132 - 164) 

Final Results 

bacterial membrane Certainty=0. 6456 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB54569 GB:AJ006392 histidine kinase [Streptococcus pneumoniae] 
Identities = 105/416 (25%) , Positives = 197/416 (47%) , Gaps = 56/416 (13%) 

Query: 7 KKFVFLTMSILI VVVLFLFAVSNRYNQYWDEYDAYRIvKLVAKNDY LGIPGDEPIAL 63 

+ F+F+ + + ++V+ L + NR + + ++ L+A DY L + G I 
Sbjct: 12 RDFIFILILLGFILvWLLLLFJTRRDNIQLKQVNQKVKDLIA-GDYSKVLDMQGGSEITN 70 

Query: 64 VTIDNQKMVKIQSNNTDLTNDVIEKSSLKL LEQGKKSRKWKSFIYSIKE 112 

+T + + ++ LT + +E+ S+L +G+ + 11 + 

Sbjct: 71 ITNNLNDLSEV IRLTQENLEQESKRLNSILFYMTDG VLATNRRGQI IMINDTAKKQ 126 
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Query: 113 YKDKTYTIAIMDLASYEVPYARRFLILVFT 1 FGFCLLAAVSLYLSR 158 

K+ , +I++L EYRLI ' I G L V L R 

Sbjct: 127 LGLVKEDVLNRSILELLKIEENYELRDLITQSPELLLDSQDINGEYLNLRVRFALIRRES 186 

5 Query: 159 -FIVGPVE TEMTREKQ FVSDASHELKTPIAAIRANVQVLEQ QIPGNR 204 

FI G V TE +E++ FVS+ SHEL+TP+ ++++ ++ L++ + 

Sbjct: 187 GFISGLVAVLHDTTEQEKEERERRLFVSNVSHELRTPLTSVKSYLEALDEGALCETVAPD 246 

Query: 205 YLDHWSETKRMEFLIEDLmLSRLDEKKSKATNFKKLlSnjSVLCQEVLLTYESLAYEEEKC 264 
10 ++ + ET RM ++ DLL+LSR+D S ++ + +N + +L ++ + +E++ 

Sbjct: 247 FIKVSLDETNR^RMVTDLLHLSRIDNATSHLDVELINFTAFITFILNRFDKMKGQEKEK 306 

Query: 265 LNDTIED DVWIVGEESQIKQILIILLDNAIRHSLSKSAIQFSLKQARRKAILTISN 320 

+ + D +W+ + ++ Q++ +L+NAI++S I +K + IL+IS+ 

15 Sbjct: 307 KYELVRDYPINSIWMEIDTDKMTQVVDNILNNAIKYSPDGGKITVRMKTTEDQMILSISD 366 

Query: 321 PSAIYSKEvMDNLFERFYQAKDDHADSLS FGLGLSIAKAIVERHKGRIRAYQE 373 

K+ + +F+RFY+ D A S + GLGLSIAK I+++HKG I A E 
Sbjct: 367 HGLGI PKQDLPRI FDRFYRV- - DRARSRAQGGTGLGLS I AKE 1 1 KQHKGFI WAKSE 420 

20 

A related sequence was also identified in GAS <SEQ ID 9131> which encodes the amino acid sequence 
<SEQ ID 9132>. Analysis of this protein sequence reveals the following: 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-11.30 Transmembrane 9 - 25 ( 4-33) 
25 INTEGRAL Likelihood =-10.35 Transmembrane 161 - 177 ( 154 - 190) 

PERIPHERAL Likelihood = 4.35 142 

Final Results 

bacterial membrane Certainty=0. 5522 (Affirmative) < suco 

30 bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 94/406 (23%) , Positives = 190/406 (46%) , Gaps = 31/406 (7%) 

35 

Query: 1 MFSDLRKKFVFLTMSILIWVLFLFAVSNRYNQYWDEYDAYRIVKLVAKNDYLGIPGDEP 60 

MF+ +R +F+ + + +++ + + N Y + + RI+ L++ N +PG 
Sbjct: 10 MFNRIRIRFIMIASIAIFIILSSIVGIINTARCYQSQQEINRILHLISSNKGK-LPGTTE 68 

40 Query: 61 IAL VTIDNQKMVKIQS NNTDLTNDVIEKSSLKLLE QGK 98 

+ ++ D+ + S N L+++ S+L E + K 

Sbjct: 69 SSKRLGTKLSEDSLSQFRYYSVIFNANGHLLSSNTANISALDREEAQYFARLFAKSGEEK 128 

Query: 99 KSRKWKSFIYS- -IKEYKDKTYTIAIMDLASYEVPYARRFLILVFTIFG-FCLLAAVSLY 155 
45 S + + +YS I + ++ + I+D Y + V FG F + 

Sbjct: 129 GSYRHQDSvYSYLITQLPNEEKLWILDTTFYFRSVGDLLAVSVMLAFGGFIFFWLVSL 188 

Query: 156 LSRFIVGPVETEMTREKQFVSDASHELKTPIAAIRANVQVLEQQIPGNRYLDHWSETKR 215 
S ++ P ++++F+++A HELKTP+A I AN +++E + + + KR 

50 Sbjct: 189 FSGWIKPFVQNYEKQRRFITNAGHELKTPIAIISANNELVELMTGESEWTKSTSDQVKR 248 

Query: 216 MEFLIEDLLNLSRLDEKRSKVNFKKIOTiSVLCQEvLLTYESLAYEEEKCLNDTIEDDVWI 275 

+ LI ++ L+RL+E+ V ++ S + Q+ ++SL ++ K + TI+ ++ I 
Sbjct: 249 LTGLINQMITLARLEEQPDW-LHMVDFSAIAQDAAEDFKSLVLKDGKRFDLTIQPNIMI 307 

55 

Query: 276 VGEESQIKQILI ILLDNAIRHSLSKSAIQFSLK QARRKAILTISNPSAIYSKEVMDN 332 

EE + +++ IL+DNA ++ K ++ SL + R++A L +SN 
Sbjct: 308 KAEEKSLFELWILvDNANKYCDPKGLVKVSLTTIGRRRKRAKLEVSNTYLEGKSIDYSR 367 

60 Query: 333 LFERFYQAKDDH - ADSLS FGLGLS I AKAIVERHKGRIRAYQEKDQL 377 

FERFY+ + H + +G+GLS+A+++V+ KG I + D + 
Sbjct: 368 FFERFYREDESHNSKEKGYGIGLSMAESMVKLFKGTITVNYKNDAI 413 
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A related GBS gene <SEQ ID 8915> and protein <SEQ ID 8916> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 7 
McG: Discrim Score: 17.50 
5 GvH: Signal Score (-7.5): -2.9 

Possible site: 26 
>>> Seems to have an uncleavable N-term signal seq 
ALOM program count: 2 value: -13.64 threshold: 0.0 

INTEGRAL Likelihood =-13.64 Transmembrane 9 - 25 ( 5 - 32) 
10 INTEGRAL Likelihood =-11.62 Transmembrane 136 - 152 ( 132 - 164) 

PERIPHERAL Likelihood = 2.49 345 
modified ALOM score: 3.23 



15 



20 



25 



35 



50 



*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0 . 6456 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

28.3/57.2% over 371aa 

Listeria monocytogenes 

GP | 6117973 | LisK Insert characterized 



ORF0034K631 - 1452 of 1785) 

GP|6117973|gb|AAF03933.l|AF139908_3|AF139908(105 - 476 of 483) LisK {Listeria 
monocytogenes} 
%Match =8.4 
30 %Identity =28.2 %Similarity =57.1 

Matches = 79 Mismatches = 113 Conservative Sub.s = 81 



459 489 519 549 579 609 639 669 

VKLVAKOTJYLGIPGDEPIALVTIDNQKMVKIQSNNTDLTN^ 

I : = I I = I :|= = h = I = I I == I = 

C^IGQMLLKEEEPEVKELLLATTSTLTNQDLTDNEEIKYLFNNDKTVNRKLQDQVINLYDKDGHFINKYYFSRSQDITSI 

50 60 70 80 90 100 110 



699 729 756 
40 DLASYEVPYARRFLILVFTIFG FCLLAAVSLYLSRFI - - 

l== I I =|:= II I II ll=l=:| = 

DFSQYFVSGTDKFIMNKPTIDGQKMMTAQMPIVADDNTTVIGYAQVVNPLTSYNRMMDRLLVTMILLGAVALFISGMLGY 

130 140 150 160 170 180 190 

45 783 813 843 873 

VGPVETEMTREKQFVSDASHELKTPIAAIRA 

: :|| lllllhlh = 

LLAQNFIiNPLTRLARTMNDIRKNGFQKRIETKTNSRDEIGELTVVFNDMMTRIETSFEQQKQFVEDASHELRTPVQIMEG 

210 220 230 240 250 260 270 



918 948 978 1008 1038 1068 1098 

NVQVLEQ - - - QI PG- -NRYLDHWSETKRMEFLIEDLLNLSRLDEKRSKVNFKKLNLS VLCQEVLLTYESLAYEEEKCLN 



HLKLLTRWGKDDPAVLDESLNASLTELERMKKLVQEMLDLSRAEQISQTKELQITDVNATVEQVRRNFE-VMYENFTFTL 
55 290 300 310 320 - 330 340 350 

1128 1158 1188 1218 1248 1278 1308 1335 

DTIEDDVWIVGEESQIKQILIILLDNAIRHSLSKSAIQFSLKQARRKAILTISNPSAIYSKEVMDNLFERFYQA-KDDHA 

: |= : : : : : : | | | | | : : | | | : : : | : : :::::::: |:| :| :| |||= I 

60 ICEDDTDLRALIQHNHLEQILIIIMDNAVKYSGDGTEVDMHVYKEQKQIHIDVRDYGEGISQEEIDKIFNRFYRVDKARSR 
370 380 390 400 ' 410 420 430 

1365 1395 1425 1452 1482 . 1512 1542 1572 

DSLSFGLGLSIAKAIVERHKGRIRAYQEKDQ-LRLEVQLPIDGFWTNTMIN*RKNDETIFIFYW*W 
65 = lllhlll =11 : I I I II: ::: || 
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EKGGNGLGLAIAKQLVEGYLGTINAVSEPDKGTTIKITLPYIEPKSK 
450 460 470 480 

SEQ ID 5750 (GBS34) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 15 (lane 9; MW 69kDa). 

5 GBS34-GST was purified as shown in Figure 193, lane 9. 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 1852 

A DNA sequence (GBSxl959) was identified in S.agalactiae <SEQ ID 5753> which encodes the amino 
10 acid sequence <SEQ ID 5754>. This protein is predicted to be two-component response regulator (regX3). 
Analysis of this protein sequence reveals the following: 

Possible site: 30 

>>> Seems to have no N-terminal signal sequence 

15 Final Results 

bacterial cytoplasm Certainty=0 . 1986 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suoo 

bacterial outside Certainty=0 . 0000 (Not Clear) < suoo 

20 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04091 GB:AP001508 two-component response regulator [Bacillus halodurans] 
Identities = 98/223 (43%) , Positives = 145/223 (64%) , Gaps = 5/223 (2%) 



35 



Query: 


2 


RLLVVEDEKSIAEAIQALIADKX3YSVDIAFDGDDGLEYILTGLYDLVLLDIMLPKRSGLS 6 1 






R+L++EDEK IA +Q L +GY DWG DGLE +DLVLLD+MLP+ SGL 


Sbjct: 


3 


RILI1EDEKKIARVLQLELEHEGYETDAAFSGSDGLETFQAHAWDLVLLDVMLPELSGLE 62 


Query: 


62 


VLKRVREAGLETPIIFLTAKSQTYDKVNGLDLGADDYITKPFEADELIiARIR- -LRTRQS 119 






VL+R+R TPI1 LTA++ DKV+GLDLGA+DYITKPFE +ELLAR+R LRT Q+ 


Sbjct: 


63 


TORRIRMTDPTOPIILLTARNSIPDKVSGLDLGANDYITKPFEIEELIARVRACLRTVQT 122 


Query: 


120 


SLIRANQLRLGNIRLNTDSHELESKESSVKLSNKEFJjLMEVFMRNAKQIIPKNQLISKVW 179 






+ L + +N + +++ +++L+ KEF L+ F++N Q++ + Q+++ VW 


Sb j ct : 


123 


RERVEDTLMFQELTINEKTRDVQRGNETIELTPKEFELLVFFIKNKGQVLSREQILTNVW 182 


Query: 


180 


GPSDNSEYNQLEVFISFLRKKLRFLKADIEIITTKGFGYSLEE 222 






G + N ++V++ +LRKKL +A + T +G GY L+E 


Sbjct: 


183 


GFDYYGDTNVIDVYVRYLRKKLSLTEA- - -LQTVRGVGYRLKE 222 


Based on 


this 


analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



vaccines or diagnostics. 



Example 1853 

A DNA sequence (GBSxl960) was identified in S.agalactiae <SEQ ID 5755> which encodes the amino 
acid sequence <SEQ ID 5756>. This protein is predicted to be 50S ribosomal protein L34-related protein. 
45 Analysis of this protein sequence reveals the following: 

Possible site: 32 

>>> Seems to have no N-terminal signal sequence 

Final Results 

50 bacterial cytoplasm Certainty=0. 5923 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC22660 GB:U32781 ribosomal protein L34 (rpL34) [Haemophilus influenzae Rd] 
Identities = 32/44 (72%) , Positives = 37/44 (83%) 

Query: 1 MKRTYQPSKIRRQRKHGFRHRMSTKNGRRVIASRRRKGRKVLSA 44 

MKRT+QPS ++R R HGFR RM+TKNGR+VLA RR KGRK LSA 
Sbjct: 1 MKRTFQPS VLKRSRTHGFRARMATKNGRQVLARRRAKGRKS LSA 44 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5757> which encodes the amino acid 
sequence <SEQ ID 5758>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 5385 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 42/44 (95%) , Positives = 44/44 (99%) 

Query: 1 MKRTYQPSKIRRQRKHGFRHRMSTKNGRRVLASRRRKGRKVLSA 44 

+KRTYQPSKIRRQRKHGFRHRMSTKNGRRVLA+RRRKGRKVLSA 
Sbjct: 1 VKRTYQPSKIRRQRKHGFRHRMSTKNGRRVLAARRRKGRKVLSA 44 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1854 

A DNA sequence (GBSxl961) was identified in S.agalactiae <SEQ ID 5759> which encodes the amino 
acid sequence <SEQ ID 5760>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -5.79 Transmembrane 122 - 138 ( 115 - 141) 
INTEGRAL Likelihood = -4.35 Transmembrane 19 - 35 ( 15 - 40) 



Final Results 

bacterial membrane Certainty=0. 3314 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF95990 GB:AE004350 conserved hypothetical protein [Vibrio cholerae] 
Identities = 79/145 (54%) , Positives = 117/145 (80%) 

Query: 1 MKTFVNNASKTVLSLWFGVMPTIMTVGTIALIISVSTPIFKILGTPFLPFLELLGIPEAD 60 

+++ + + + + FGV+P +M +GTIAL+I+ T +F +LG PF+PFLELLG+PEA 
Sbjct: 314 VQSVIGEGIRNAVDMVFGVLPWMGLGTIALVIAEYTSVFSLLGQPFIPFLELLGVPEAT 373 

Query: 61 IASQTMIVGFSDMWPSIMAAEIHSEMTRFIVATVSIVQLIYMSETGAVILGSKIPINIL 120 

AS+T++VGF+DM +P+I+AA I +EMTRF++A +S+ QLIYMSE GA++LGS+IP+NI+ 
Sbjct: 374 AASKTIWGFADMFIPAIIAASIDNEMTRFVIAAMSVTQLIYMSEVGALLLGSRIPVNIV 433 

Query: 121 ELFI IFIERTI I SLPI I VLMAHLFF 145 

ELF+IFI RT+I+LP+I +AHL F 
Sbjct: 434 ELFVIFILRTLITLPVIAAVAHLLF 458 



No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1855 

A DNA sequence (GBSxl962) was identified in S.agalactiae <SEQ ID 5761> which encodes the amino 
5 acid sequence <SEQ ID 5762>. This protein is predicted to be D,D-carboxypeptidase (dacA-2). Analysis of 
this protein sequence reveals the following: 

Possible site: 23 

»> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0 . 2443 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 A related GBS nucleic acid sequence <SEQ ID 9485> which encodes amino acid sequence <SEQ ID 9486> 
was also identified. A further related GBS nucleic acid sequence <SEQ ID 10945> which encodes amino 
acid sequence <SEQ ID 10946> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA67776 GB:X99400 D, D-carboxypeptidase [Streptococcus pneumoniae] 
20 Identities = 193/383 (50%), Positives = 282/383 (73%), Gaps = 6/383 (1%) 



25 



Query: 1 MAVDLDSGKILYEKDAtJKPAAIASLTKI MTVYMVYKE I DNGNLKWNTKVNI SDYPYQLTR 60 

+AV+ ++GKILYEKDA +P IAS+TK++TVY+VY+ ++NG++ +T V+ISDYPYQLT 
Sbjct: 33 IAVEANTGKILYEKDATQPVEIASITKLITVYLVYEALENGSITLSTPVDISDYPYQLTT 92 

Query: 61 ESDASNVPLEKRRYTWQLVnAaMISSANSAAIALAEHISGTESKFVDKMTAQLEKWGIH 120 

S+ASN+P+E R YTV++L++A ++SSANSAAIALAE I+G+E FVD M A+L +WGI 
Sbjct: 93 NSEASNIPMEARNYTVEELLEATLVSSANSAAIALAEKIAGSEKDFVDmRAKlLEWGIQ 152 

30 Query: 121 DSHLVNASGLNNSMLGNHIYPKSSQNDENKMSARDIAIVAYHLVNEYPSILKITSKSVAK 180 

D+ +VN +GLNN LG++IYP S +++ENK+SA D+AIVA +L+ +YP +L+IT K + 
Sbjct: 153 DATVA/NTTGLNNETLGDNIYPGSKKDEENKLSAYDVAIVARNLIKKYPQVLEITKKPSST 212 

Query: 181 FDKDIMHSYNYMLPDMPVFRPGITGLKTGTTELAGQSFIATSTESGMRLLTVIMHADKAD 240 
35 F + S NYML MP +R G GLKTGTT+ AG+SF+ T+ E GMR++TV+++AD D 

Sbjct: 213 FAGMTITSTNYMLEGMPAYRGGFDGLKTGTTDKAGESFVGTTVEKGMRVITVVLNADHQD 272 

Query: 241 KDKYARFTATNSLLNYITNTYEPNLVTjAKGAAYKGKFASVRDGKEQSVIAVAKND 300 
+ YARFTAT+SL++YI++T+ ++ +G AY+ +A V+DGKE +VIAVA D+ +++ 
40 Sbjct: 273 NNPYARFTATSSLMDYISSTFTLRKIVQQGDAYQDSKAPVQDGKEDTVIAVAPEDIYLIE 332 

Query: 301 KKNITKQNQLKINF KKELTAPITKKENLGKAYYVDLNKVGKGYLIKE-PSVHLVAKD 356 

+ + Q+ + F K + AP+ +G Y D + +G+GY+ E PS +VA 

Sbjct: 333 R--VGNQSSQSVQFTPDSKAIPAPLEAGTWGHLTYEDKDLIGQGYITTERPSFEMVADK 390 

45 

Query: 357 SIERSFFLKVWWNHFVRYVNEKL 379 

1E++FFLKVWWN FVR+VNEKL 
Sbjct: 391 KIEKAFFLKVWWNQFVRFVNEKL 413 

50 A related DNA sequence was identified in S.pyogenes <SEQ ID 5763> which encodes the amino acid 
sequence <SEQ ID 5764>. Analysis of this protein sequence reveals the following: 

Possible site: 21 



>» Seems to have a cleavable N-term signal seq. 



55 



Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 
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bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 176/380 (46%) , Positives = 257/380 (67%) , Gaps = 3/380 (0%) 



Query: 


1 


^VDLDSGKILYEKDANKPAAIASLTKIMTVYMVYKEIDNGNLKWNTKVNISDYPYQLTR 


60 






+AVDL+SGK+LYEKDA + +AS++K++T Y+VYKE+ G L W++ V IS+YPY+LT 




Sbjct: 


33 


IAVDLESGKVLYEKDAKEVVPVASVSKLLTTYLVYKEVSKGKLNWDSPVTISNYPYELTT 


92 


Query: 


61 


ESDASNVPLEKRRYTVKQLVDAAMISSANSAAIALAEHISGTESKFVDKMTAQLEKWGIH 


120 






SNVPL+KR+YTVK+L+ A ++++ANS AIALAE I GTE KFVDKM QL +WGI 




Sbjct: 


93 


NYTISWPLDKRKYTVKELLSALVVNNANSPAIALAEKIGGTEPKFVDKMKKQLRQWGIS 


152 


Query: 


121 


DSHLVNASGIjNNSMLGNHIYPKSSQNDENKMSARDIAIVAYHLvNEYPSILKITSKSVAK 


180 






D+ +VN++GL N LG + YP + +DEN A D+AI+A HL+ E+P +LK++SKS 




Sb j ct : 


153 


DAKVVNSTGLTNHFLGANTYPNTEPDDENCFCATDLAIIARHLLLEFPEVLKLSSKSSTI 


212 


Query: 


181 


FDKDIMHSYNYMLPDMPVFRPGITGLKTGTTELAGQSFIATSTESGMRLLTVIMHADKAD 


240 






F ++SYNYML MP +R G+ GL G ++ AG SF+ATS E+ MR++TV+++AD++ 




Sbjct: 


213 


FAGQTIYSYNYMLKGMPCYREGVDGLWGYSKKAGASFVATSVENQ^VITVVLNADQSH 


272 


Query: 


241 


KDKYARFTATNSLIOTITNTYEPNLVIAKGAAYKGKFASVRDGKEQSVIAVAKNDLKWQ 


300 






+D A F TN LL Y+ ++ ++ K V D E++V VA+N L ++ 




Sbjct: 


273 


EDD1AIFKTTNQLLQYLLINFQKVQLIENNKPV- -KTLYVLDSPEKTVKLVAQNSLFFIK 


330 


Query: 


301 


KKNITKQNQLKINFKKE-LTAPITKKENLGKAYYVDLNKVGKGYLIKEPSVHLVAKDSIE 


359 






+ +N + I K + AP++K + LG+A D + +G+GYL PS++L+ + +1 




Sb j ct : 


331 


PlHTKTKNTVHITKZSSTMIAPLSKGQVIfiRATLQDKHLIGQGYLDTPPSINLILQKNIS 


390 


Query: 


360 


RS FFLKVWWNHFVRYVNEKli 379 








+SFFLKVWWN FVRYVN L 




Sbjct: 


391 


KSFFLKVWWNRFVRYVNTSL 410 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1856 

A DNA sequence (GBSxl963) was identified in S.agalactiae <SEQ ID 5765> which encodes the amino 
acid sequence <SEQ ID 5766>. This protein is predicted to be penicillin binding protein 4 (pdp4) (dacA-1). 
Analysis of this protein sequence reveals the following: 

, Possible site: 23 
»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-12.58 Transmembrane 368 - 384 ( 363 - 394) 

Final Results 

bacterial membrane Certainty=0 . 6031 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA60582 GB:X87104 penicillin binding protein 4 [Staphylococcus 
aureus] 

Identities = 117/333 (35%) , Positives = 188/333 (56%) , Gaps = 8/333 (2%) 

Query: 5 IVSFLCILLSLTCVNSVQAEEHKDIMQITREAGY-DVKDINKPKASIVIDNKGHILWEDN 63 

1+ LC+ LS+ + A +Q + GY + +P +++ + G +L++ N 

Sbjct: 7 III ILCLTLS IMTPYAQAANSDVTPVQAANQYGYAGLSAAYEPTSAVNVSQTGQLLYQYN 66 

Query: 64 ADLERDPASMSKMFTLYLLFEDLAKGKTSLNTTVTATETDQAISKIYEISNNNIHAGVAY 123 
D + +PASM+K+ T+YL E + KG+ SL+ TVT T + +S + E+SN ++ G + 
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Sbjct: 67 IDTKWNPASMTKLMTMYLTLEaVNKGQLSLDDTVTMIWKEYIMSTLPELSNTKLYPGQVW 126 

Query: 124 PIRELITMTAVPSSNVATIMIANHLSQNNPDAPIKRINETAKKLGMTKTHFYNPSGAVAS 183 

I +L+ +T SSN A +++A +S+N D P+ +N AK +GM THF NP+GA S 
Sbjct: 127 TlADLLQITVSNSSNAAALIIAKKVSKNTSD-FVDLMNNKAKAIGMHSrTHFVNPTGAENS 185 

Query: 184 AFNGLYS PKEYDNNATNVTTARDLS I LTYHFLKKYPDIIiNYTKYPEVKAMVGTPYEETFT 243 

++P +Y + VTTARD +IL H +K+ P IL++T K + T + T+ 

Sbjct: 186 RLR-TFAPTKYKDQERTVTTARDYAILDLHVIKETPKILDFT KQLAPTTHAVTYY 239 

Query: 244 TYNYSTPGAKFGLEGVDGLKTGSSPSAAFNALVTAKRQNTRLITWLGVGDWSDQDGEYY 303 

T+N+S GAK L G DGLKTGSS +A +N +T KR R+ V++G GD+ + GE 
Sbjct: 240 TFNFSLEGAKMSLPGTDGLKTGSSDTANYNHTITTKRGKFRINQVIMGAGDYKNLGGEKQ 299 

15 Query: 304 RHPFVNALVEKGFKDAKNISSKTPVLKAVKPKK 336 

R+ NAL+E+ F K + 4- + + KK 
Sbjct: 300 RNMMGNALMERSFDQYKYVKILSKGEQRINGKK 332 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5767> which encodes the amino acid 
20 sequence <SEQ ID 5768>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-15.18 Transmembrane 371 - 387 ( 364 - 392) 

25 



30 



35 



55 



1 Final Results 

bacterial membrane Certainty=0 . 7071 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0.0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAA62899 GB:X91786 penicillin-binding protein 4 [Staphylococcus 
aureus] 

Identities = 119/328 (36%), Positives = 184/328 (55%), Gaps = 19/328 (5%) 

Query: 6 ILTIFTFICF- -SVMPLVHAEDVMDIT RQAGYT-VSEVNRPKSSIVVDANSSDIL 57 

+++I +C S+M D+T Q GY +S P S++ V + + +L 

Sbjct: 4 LISIIIILCLTLSIMTPYAQATNSDVTPVQAANQYGYAGLSAAYEPTSAVNV-SQTGQLL 62 

40 Query: 58 WQDNIDIPRDPASMSKMFTLYILFEELAKGKITMDTTITATPTDQAIANIYEISNNNIVA 117 

+Q NID +PASM+K+ T+Y+ E + KG++++D T+T T + ++ + E+SN + 
Sbjct: 63 YQYNIDTKWNPASMTK1MTMYLTLEAVNKGQLSLDDTVTMTNKEYIMSTLPELSNTKLYP 122 

Query: 118 GVAYPIRDLITMTAVPSSNAATVMIANYLSNNDASAFIDRVNATAKQLGMTNTHFSNASG 177 
45 G + I DL+ +T SSNAA +++A +S N S F+D +N AK +GM NTHF N +G 

Sbjct: 123 GQvmIADLLQIWSNSSNAAALIIAKKVSKN-TSDFVDL^^KAKAIGMKNTHFVNPTG 181 

Query: 178 AAAQAFQGYYNPTKYDLSASNITTARDLSKLLYAFLKKYPEIISFTNKSVVHTMVGTPYE 237 
A + + PTKY +TTARD + L +K+ P+I+ FT '+ T+ T 

50 Sbjct: 182 AENSRLR-TFAPTKYKDQERTVTTARDYAILDLHVIKETPKILDFTKQLAPTTLAVT--- 237 

Query: 238 EEFHTYNHSLPDNQFGMKGVDGLKTGSSPSAAFNAMITAKRGKTRLITIVMGVGDWSDQN 297 

++T+N SL + + G DGLKTGSS +A +N IT KRGK R+ ++MG GD+ + 
Sbjct: 238 --YYTFNFSLEGAKMSLPGTDGLKTGSSDTANYNHTITTKRGKFRINQVIMGAGDYKNLG 295 



Query: 298 GEFYRHPFVNALTEKGF KDSKTLSK 322 

GE R+ NAL E+ F K K LSK 
Sbjct: 296 GEKQRNMMGNALMERSFDQYKYVKILSK 323 



60 An alignment of the GAS and GBS proteins is shown below. 

Identities = 226/382 (59%) , Positives = 289/382 (75%) , Gaps = 7/382 (1%) 



Query: 12 LLSLTCVNSVQAEEHKDIMQITREAGYDVKDINKPKASIVID-NKGHILWEDNADLERDP 70 
+ + C + + +D+M ITR+AGY V ++N+PK+SIV+D N ILW+DN D+ RDP 



WO 02/34771 



PCT/GB01/04789 



-2093- 



Sb j ct : 


9 


IFTFIPFSVMPTAfflAF.nVMnTTRnanVT^7QV^7TvrPPKQQT\A7naNRSnTT.WnnT\TTriTPPnD 
•i. i. ii -i i uvriruv nnDU V 1'11_> J. ± £\\Jn\j 1 1 vou vlVixri\OD± V vui1Li^iji> J. J_1VV^1_/1M 1U1 t X\D±r 


a q 


Query: 


71 


ASMSKMFTLYI^FEDLAKGKTS^ 


130 






ASMS KM FTLY+ L FE + LAKGK +++TT-J-TAT TDQAI+ IYEISNMNI AGVAYPIR+LIT 




Sb j ct : 


69 


ASMSKTtfFTT.YTTiFFFT.AT^KTTMnTTTTATPT^ TV 


TOO 


Query: 


131 


MTAVPSSWATIMIANHLSQNNPDAFIKRINETAKKLGMTKTHFYNPSGAVASAFNGLYS 


190 






MTAVPSSN AT+MIAN+LS N+ AFI R+N TAK+LGMT THF N SGA A AF G Y+ 




Sbj ct : 


129 


MTAVPflflNAAT\7MTAT\TVT 1 QT(^^^ r , MT^T^HT7QMAQ(72iAAn7iTl'nr l VVTvT 


188 


Query: 


191 


PKEYDNNATmWTARDLSILTYHFLKKYPDIIiNYTKYPEVKAMVGTPYEETFTTYNYSTP 


250 






P +YD +A+N+TTARDLS L Y FLKKYP+I+++T V MVGTPYEE F TYN+S P 




Sbj ct : 


189 


PTJ^vnTifiA^'MTT'TAPnr.c'Tf rTiVAFf.tf irvDPTTCF'T^TircTA/TJT'iv/rcrr'T^ n 
.it j. in. j. i/uunori x j. iiTJA.J-JJ_JOz\-Uij x>ir xji\J\.x rCiiior ±lMi\.o v viliJYl vul f iCiCiactix Xvixl&Lir 


248 


Query: 


251 


GAKFGLEGVDGLKTGSSPSAAFNALVTAKRQNTRLITVVLGVGDWSDQDGEYYRHPFVNA 


310 






+FG+ +GVDGLKTGS S PSAA.FNA+ +TAKR TRLIT+V+GVGDWSDQ+GE+YRHPFVNA 




Sbj ct • 


249 


iJJNyr ^M^^VtXjljJ^ltjbbFfaAAr JNIAiVli lAKJx(jK.iKJ_iJ. 1 1 VMlaVCjDWbDyjNGEFYRHPF VNA 


308 


Query : 


311 




367 






L EKGFKD+K +S K L+ + P+ TK +T S Q+ + K+ + + + F+ + 




Sbjct: 


309 


LTEKGFKDSKTLSKKARQKLEKLVPQ TKKETSSKQQHFKATKKQSYLERVEDFMNHN 


365 


Query: 


368 


FVSILIVLGTIAILCLLAGIVL 389 








+LI L I LL +V+ 




Sbjct: 


366 


HTFLLICLAIFIITILLLSLW 387 





A related GBS gene <SEQ ID 8917> and protein <SEQ LD 8918> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crenel: 9 
McG: Discrim Score: -14.02 
GvH: Signal Score (-7.5): -2.54 

Possible site: 60 
»> Seems to have no N- terminal signal sequence 
ALOM program count: 1 value: -12.58 threshold: 0.0 

INTEGRAL Likelihood =-12.58 Transmembrane 339 - 355 ( 334 - 365) 
PERIPHERAL Likelihood =1.38 99 
modified ALOM score: 3.02 

*** Reasoning Step: 3 

Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 

The protein has homology with the following sequences in the databases: 

ORF01254(301 - 1386 of 1698) 

EGAD|40430|42591(32 - 419 of 431) penicillin binding protein 4 (pdp4) 
aureus} GP| 1125682 | emb | CAA60585 . 1 | |X87105 penicillin binding protein 4 
aureus} GP j 1125686 j emb | CAA.60582 . 1 1 |X87104 penicillin binding protein 4 
aureus } 
%Match =17.3 

%Identity =36.3 %Similarity =59.6 

Matches = 123 Mismatches =130 Conservative Sub.s = 79 

264 294 324 351 381 411 441 471 

FPLHFIIPDLCKLCAS*RHKDIMQITREAGY-DVKDINKPKASIVIDNKGHILWEDNADLERDPASMSKMFTLYLLFEDL 
=1 = II : =1 = = = = h = l = = I I = :||||:|: h|| :| s 
ILCLTLSIMTPYAQAANSDVTPVQAANQYGYAGLSAAYEPTSAVNVSQTGQLLYQYNIDTKWNPASMTKLMTMYLTLEAV 
20 30 40 50 60 70 80 

501 531 561 591 621 651 681 711 

AKGKTSLNTTVTATETDQAISKIYEISNNNIHAGVAYPIRELITMTAVPSSNVATIMIAHHLSQNNPDAFIKRINETAK^ 

11= 11= III I = =1 = hll == I = I =1= =1 III I =::| :|:| | |: :| || 
NKGQLSLDDTVTMTNKEYIMSTLPELSNTKLYPGQVWTl^ 



Certainty=0. 6031 (Affirmative) < suco 

Certainty=0. 0000 (Not Clear) < suco 

Certainty=0. 0000 (Not Clear) < suco 



{Staphylococcus 
{Staphylococcus 
{ Staphylococcus 
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100 



110 



120 



130 



140 



150 



160 



741 771 801 831 861 891 921 951 

LGMTKTHFYNPSGAVASAFNGLYSPKEYDI^ATNVTTARDL^^ 



IGMKNTHFVNPTGAENSR-LRTFAPTKYKDQERTVTTARDYAILDLHVIKETPKILDFTK- 
180 190 200 210 220 



• QLAPTTHAVTYYTFN 
230 240 



981 1011 1041 1071 1101 1131 1161 

YSTPGAKFGLEGVDGLKTGSSPSAAFNALVTAKRQOTRLITWLGVGDWSDQDGEYYRHPFVNALVEKGFKDAK- 



FSLEGAKMSLPGTDGLKTGSSDTANYNHTITTKRGKFRINQVIMGAGDYKNLGGEKQRM«GNALMERSFDQYKYVKILS 
260 270 280 290 300 310 320 




340 350 360 370 380 390 400 



1296 1326 1356 1386 1416 1446 1476 1506 

TKEQWTOICrDQFIQSHFVSILIVLGTIAILCLLAGIVLLIKRSR**LC*YKSPLHQ*HRGFLLSLEIFN*PTEPSIS*EI 




410 420 



SEQ ID 8918 (GBS379) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 68 (lane 5; MW 44kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 72 (lane 3; MW 68.9kDa). 

GBS379-GST was purified as shown in Figure 212, lane 7. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1857 

A DNA sequence (GBSxl964) was identified in S.agalactiae <SEQ ID 5769> which encodes the amino 
acid sequence <SEQ ID 5770>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

>>> Seems to have no N-terminal signal sequence 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15256 GB:Z99120 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 316/459 (68%) , Positives = 386/459 (83%) 

Query: 14 DLGEYKFGFHDDVKPIYSTGKGIJ^VIRELSAAKGEPEWMLDFRLKSLETFNKMPMQTW 73 

D+GEYK+GFHD 1+ + +GIi + ++ E+S K EP+WMLDFRLKSLE F MPM W 
Sbjct: 7 DIGEYKYGFHDKDVSIFRSERGLTKEIVEEISRMKEEPQWMLDFRLKSLEHFYNMPMPQW 66 

Query: 74 GADLSDIDFDDIIYYQKASDKPARDWDDVPEKIKETFERIGIPEAERAYIiAGASAQYESE 133 

G DL+ ++FD+I YY K S++ R WD+VPE+IK+TF+++GIPEAE+ YLAG SAQYESE 
Sbjct: 67 GGDMSIjNFDEITYYVKPSERSERSWDEVPEEIKQTFDKLGIPEAEQKYIjAGVSAQYESE 126 

Query: 134 VvYHNMKEEYDKLGIVFTDTDSALKEYPELFKKYFAKIVPPTDNKLAM^SAVWSGGTFI 193 

WYHNMKE+ + GIVF DTDSALKE ++F++++AK++PPTDNK AAIiNSAVWSGG+FI 
Sbjct: 127 vVYHNMKEDLEAQGIVFKDTDSALKENEDIFREHWAKVIPPTDNKFAALNSAvWSGGSFI 186 



Final Results 



bacterial cytoplasm Certainty=0. 4039 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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Query: 194 YVPKGVKVDIPLQTYFRINNENTGQFERTLIIVDEG2iSVHYVEGCTAPTYSSNSLHAAIV 253 

YVPKGVKV+ PLQ YFRIN+EN GQFERTLI IVDE ASVHYVEGCTAP Y++NSLH+A+V 
Sbjct: 187 YVPKGVKVETPLQAYFRINSENMGQFERTLIIVDEEASVHYVEGCTAPVYTTNSLHSAW 246 

5 Query: 254 EIFALDGAYMRYTTIQNWSDNVYNLVTKRATAKKDATVEWIDGNLGAKTTMKYPSVYLDG 313 

EI G Y RYTTIQNW++NVYNLVTKR +++AT+EWIDGN+G+K TMKYP+ L G 
Sbjct: 247 EIIVKKGGYCRYTTIQNWAtmVYNLVTKRWCEEimTMEWIDGNIGSKLTMKYPACILKG 306 

Query: 314 EGARGTMLSIAFANKGQHQDTGAKMIHNAPHTSSSIVSKSIAKGGGKVDYRGQVTFNKDS 373 
10 EGARG LSIA A KGQHQD GAKMIH AP+TSS+IVSKSI+K GGKV YRG V F + + 

Sbjct: 307 EGARGMTLS IAIAGKGQHQDAGAKMIHLAPNTSSTIVSKS I S KQGGKVTYRGI VHFGRKA 366 

Query: 374 KKSVSHIECDTILMDDISKSDTIPFNEIHNSQVALEHEAKVSKISEEQLYYLMSRGLSEA 433 
+ + S+IECDT++MD+ S SDTIP+NEI N ++LEHEAKVSK+SEEQL+YLMSRG+SE 
15 Sbjct: 367 EGARSNIECDTLIMDNKSTSDTIPYNEILNDNISLEHEAKVSKVSEEQLFYLMSRGISEE 426 

Query: 434 EATEMIVMGFVEPFTKELPMEYAVELNRLISYEMEGSVG 472 

EATEMIVMGF+EPFTKELPMEYAVE+NRLI +EMEGS+G 
Sbjct: 427 EATEMIVMGFIEPFTKELPMEYAVEMNRLIKFEMEGSIG 465 



20 



25 



30 



A related DNA sequence was identified in S.pyogenes <SEQ ID 577 1> which encodes the amino acid 
sequence <SEQ ID 5772>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

■»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 3780 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 445/472 (94%) , Positives = 461/472 (97%) 

Query: 1 MSEINEKVEPQPIDLGEYKFGFHDDVKI'IYSTGKGLNEAVIRELSAAKGEPEWMLDFRLK 60 
35 MS+INEKVEP+PIDLG+Y+FGFHDDV+PIYSTGKGL+EAV+RELSAAK EPEWML+FRLK 

Sbjct: 1 MSDINEKVEPKPIDLGDYQFGFHDDVEPIYSTGKGLSEAWRELSAAKNEPEWMLEFRLK 60 

Query: 61 SLETFNKMPMQTWGADLSDIDFDDIIYYQKASDKPARDWDDVPEKIPCETFERIGIPEAER 120 
SLETFNKMPMQTWGADLSDI+FDDIIYYQKASDKPAR WDDVPEKTKETF+RIGIPEAER 
40 Sbjct: 61 SLETFNKMPMQTWGADLSDINFDDIIYYQKASDKPARSWDDVPEKIKETFDRIGIPEAER 120 

Query: 121 AYLAGASAQYESEVVYHNMKEEYDKLGIVFTDTDSALKEYPELFKKYFAKLVPPTDNKLA 180 

AYLAGASAQYESEWYHNMK E++KLGI+FTDTDSALKEYP+LFK+YFAKLVPPTDNKLA 
Sbjct: 121 AYIiAGASAQYESEWYHNMKGEFEKLGI I FTDTDSALKEYPDLFKQYFAKLVPPTDNKLA 180 

45 

Query: 181 AmSAWSGGTFIYVPKGVKVDIPLQTYFRINNENTGQFERTLIIVDEGASVHYVEGCTA 240 

ALNSA WSGGTFIYVPKGVKVDIPLQTYFRINNENTGQFERTLIIVDEGASVHYVEGCTA 
Sbjct: 181 ALNSAAWSGGTFIYVPKGVKVDIPLQTYFRINNENTGQFERTLIIVDEGASVHYVEGCTA 240 

50 Query: 241 PTYSSNSLHAAIVEIFALDGAYMRYTTIQNWSDi^VYNLVTKRATAKKDATVEWIDGNLGA 300 

PTYSSNSLHAAIVEIFALDGAYMRYTTIQNWSDNVYNLVTKRA A DATVEWIDGNLGA 
Sbjct: 241 PTYSSNSLHAAIVEIFALDGAYMRYTTIQNWSDIWYNLvTKRARALTDATVEWIDGNLGA 300 

Query: 301 KTTMKYPSWLDGEGARGTMLSIAFANKGQHQDTGAKMIHNAPHTSSSIVSKSIAKGGGK 360 
55 KTTMKYPSVYLDG GARGTMLSIAFAN GQHQDTGAKMIHNAPHTSSSIVSKSIAK GGK 

Sbjct: 301 KTTMKYPSVYLDGPGARGTMLSIAFANAGQHQDTGAKMIHNAPHTSSSIVSKSIAKSGGK 360 

Query: 361 VDYRGQvTFNKDSKKSVSHIECDTILMDDISKSDTIPFNEIHNSQVALEHEAKVSKISEE 420 
VDYRGQVTFNK SKKSVSHIECDTILMDDISKSDTIPFNEIHNSQVALEHEAKVSKISEE 
60 Sbjct: 361 VDYRGQVTFNKQSKKSVSHIECDTILMDDISKSDTIPFNEIHNSQVALEHEAKVSKISEE 420 

Query: 421 QLYYLMSRGLSEAEATEMIVMGFVEPFTKEIjPMEYAVELNRLISYEMEGSVG 472 

QLYYLMSRGLSE+EATEMIVMGFVEPFTKELPMEYAVELNRLISYEMEGSVG 
Sbjct: 421 QLYYLMSRGLSESEATEMIVMGFVEPFTKELPMEYAVEIjNRLISYEMEGSVG 472 

65 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1858 

A DNA sequence (GBSxl965) was identified in S.agalactiae <SEQ ID 5773> which encodes the amino 
5 acid sequence <SEQ ID 5774>. This protein is predicted to be nitrogen fixation protein (nifU). Analysis of 
this protein sequence reveals the following: 

Possible site: 61 

>>> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0. 1078 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15257 GB:Z99120 similar to NifU protein homolog [Bacillus subtilis] 
Identities = 72/139 (51%) , Positives = 92/139 (65%) 

Query: 4 SKLDNLYMAWADHSKHPHHHGFLEGVEQVQLNNPTCGDVISLSVKFDGNIISDIAFAGN 63 
20 + LD LY V+ DH K+P + G L V +NNPTCGD I L++K DG+I+ D F G 

Sbjct: 5 ANLDTLYRQVIMDHYKNPRNKGVLNDSIVVDMiraPTCGDRIRLTMKLDGDIVEDAKFEGK 64 

Query: 64 GCTISTASSSMMTDAVIGKTKEFALQLADVFSKMVQGDQNPKQEKLGDAEFLAGVSKFPQ 123 
GC+IS AS+SMMT A+ GK E AL ++ +FS M+QG + LGD E L GVSKFP 

25 Sbjct: 65 GCSISMASASMMTQAIKGKDIETALSMSKIFSDMMQGKEYDDSIDLGDIEALQGVSKFPA 124 



30 



35 



40 



55 



Query: 124 RIKCATLSWNALRKAIERD 142 

RIKCATLSW AL K + ++ 
Sbjct: 125 RIKCATLSWKALEKGVAKE 143 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5775> which encodes the amino acid 
sequence <SEQ ID 5776>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 1202 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 114/146 (78%) , Positives = 133/146 (91%) 

Query: 1 MALSKLDNLYMAWADHSKHPHHHGFLEGVEQVQLNNPTCGDVISLSVKFDGNIISDIAF 60 
45 MALSKL++LYMAWADHSK PHHHG L+GVE VQLNNPTCGDVI SL+ VKFD + I DIAF 

Sbjct: 1 MALSKI^LYmWADHSKRPHHHGQLDGViaVQl^PTCGDVISLTVKFDEDKIEDIAF 60 

Query: 61 AGNGCTISTASSSMMTDAVIGKTKEEALQLADVFSKMVQGDQNPKQEKLGDAEFLAGVSK 120 
AGNGCTISTASSSMMTDAVIGK+KEEAL LAD+FS+MVQG +NP Q++LG+AE LAGV+K 
50 Sbjct: 61 AGNGCTISTASSS^TDAVIGKSKEFJUjALADIFSEMVQGQENPAQKELGEAELLAGVAK 120 

Query: 121 FPQRIKCATLSWNALRKAIERDNQAE 146 

FPQRIKC+TL+WNAL++AI+R A+ 
Sbjct: 121 FPQRI KCSTLAWNALKEAI KRSANAQ 146 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1859 

A DNA sequence (GBSxl966) was identified in S.agalactiae <SEQ ID 5777> which encodes the amino 
acid sequence <SEQ ID 5778>. This, protein is predicted to be nitrogen fixation protein (nifS) (bl680). 
Analysis of this protein sequence reveals the following: 

5 Possible site: 43 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2453 (Affirmative) < suco 

10 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15258 GB:Z99120 similar to NifS protein homolog [Bacillus subtilis] 
15 Identities = 240/400 (60%) , Positives = 306/400 (76%) , Gaps = 5/400 (1%) 

Query: 9 LKQDFPILNQLVNDEPLIYLDNAaTTQKPNQVLEALRDYYQNDNANVHRGVHTLAERATA 68 

+++ FPIL+Q VN L+YLD+AAT+QKP V+E L YY N+NVHRGVHTL RAT 
Sbjct: 6 IRKQFPILHQQVNGHDLVYLDSAATSQKPRAVIETLDKYYNQYNSNVHRGVHTLGTRATD 65 

20 

Query: 69 QYENAREKARQFLNAKLS KE I LFTRGTTTGLNWVA- KFAES I LERGDEVLI S IMEHHSNI 127 

YE AREK R+F+NAK EI+FT+GTTT IB VA +A + L+ GDEV+I+ MEHH+NI 
Sbjct: 66 GYEGAREKVRKFINAKSMAEIIFTKGTTTSLNMVALSYARANLKPGDEWITYMEHHANI 125 

25 Query: 128 I PWQQACERTGAKLVYAYLK- DGSLDLED F YNKLS S KTKFVSLAH I SNVLGCVTPVKAI A 186 

IPWQQA + TGA L Y L+ DG++ LED ++S TK V+++H+SNVLG V P+K +A 
Sbjct: 126 I PWQQAVKATGATLKYI PLQEDGTI SLED VRETVTSNTKI VAVSHVSNVLGTVNP I KEMA 185 

Query: 187 ERvHQVGAYMVVDGAQSAPHMAIDVQDLDCDFFALSGHKMLGPTGIGvLYGKESILDKMP 246 
30 + H GA +WDGAQS PHM IDVQDLDCDFFALS HKM GPTG+GVLYGK+++L+ M 

Sbjct: 186 KIAHDNGAVIVATDGAQSTPHMKIDVQDIJDCDFFALSSHKMCGPTGVGVIYGKKALIjENME 245 

Query: 247 PVEFGGEMIDFVYEQSATWKELPWKFEAGTPNIAGAIAFGEALDYLTDVGMDEIHQYEQS 306 
P EFGGEMIDFV +TWKELPWKFEAGTP IAGAI G A+D+L ++G+DEI ++E 
35 Sbjct: 246 PAEFGGEMIDFVGLYESTWKELPWKFEAGTPIIAGAIGLGAAIDFLEEIGLDEISRHEHK 305 

Query: 307 LVSYVLPKLQAIDGLTIYGPSDAESHVGVIAFNLEGLHPHDVATAMDYEGVAVRAGHHCA 366 

L +Y L + + +DG+T+YGP E G++ FNL+ +HPHDVAT +D EG+AVRAGHHCA 
Sbjct: 306 LAAYALERFRQLDGVTVYGP EERAGIjVTFNLDDVHPHDVATVLDAEG I AVRAGHHCA 362 



40 



Query: 367 QPLINHLGIHSAVRASFYFYNTKEDCDKLVDAI QKTKEFF 406 

QPL+ L + + RASFY YNT+E+ DKLV+A+QKTKE+F 
Sbjct: 363 QPLMKWLDVTATARASFYLYNTEEEIDKLVEALQKTKEYF 402 

45 A related DNA sequence was identified in S.pyogenes <SEQ ID 5779> which encodes the amino acid 
sequence <SEQ ID 5780>. Analysis of this protein sequence reveals the following: 

Possible site: 41 

>>> Seems to have no N-terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty=0. 3714 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

55 An alignment of the GAS and GBS proteins is shown below. 

Identities = 293/408 (71%) , Positives = 349/408 (84%) 

Query: 3 LLDSYKLKQDFPILNQLVNDEPLIYLDNAATTQKPNQVLEALRDYYQNDNANVHRGVHTL 62 
LLD+ +KQDF ILNQ VNDEPL+YLDNAATTQKP VLEAL+ YYQ DNANVHRGVHTL 
60 Sbjct: 1 LLDAKDIKQDFQII^QQvNDEPLVYLDNAATTQKPALvLEALQSYYQEDNANVHRGvHTL 60 
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Queryi 63 AERATAQYENAREKARQFLNAKLSKE I LFTRGTTTGLNWVAKFAES I LERGDEVLI S IME 122 

AERAT +YE +R++ F++AK SKE+LFTRGTTT LNWVA+FAE +L DEVLISIME 
Sbjct: 61 AERATLKYEASRQQVADFIHAKSSKEVLFTRGTTTSLNWVARFAEQVLTPEDEVLISIME 120 

5 Query: 123 HHSNIIPWQQACERTGAKLWAYLKDGSLDLEDFXNKLSSKTKFVSIAHISIWLGCVTPV 182 

HH+NI IPWQQAC++TGA+LVY YLKDG LD++D NKL++KT+FVSL H+SNVLGC+ P+ 
Sbjct: 121 HHANIIPWQQACQKTGARLVYVYLKDGQLDITODLftNKLTTKTRFVSLVHVSNVLGCINPI 180 

Query: 183 KAIAERVHQVGAYMWDGAQSAPHMAIDVQDLDCDFFALSGHKMLGPTGIGVLYGKESIL 242 
10 K IA+ H GAY+WDGAQS PH+AIDVQDLDCDFFA S HKMLGPTG+GVLYGKE +L 

Sbjct: 181 KEIAKLAHAKGAYLWDGAQSVPHLAIDVQDLDCDFFAFSAHKMLGPTGLGVLYGKEELL 240 

Query: 243 DKMPPVEFGGEMIDFVYEQSATWKELPWKFEAGTPNIAGAIAFGEALDYLTDVGMDEIHQ 302 
+++ P+EFGGEMIDFVYEQ ATWKELPWKFEAGTP+ 1 AGAI A+ YL +GM +IH 
15 Sbjct: 241 NQVEPLEFGGEMIDFVYEQEATWKELPWKFEAGTPHIAGAIGLSAA.ISYLQRLGMADIHA 300 

Query: 303 YEQSLVSYVLPKLQAIDGLTIYGPSDAESHVGVIAFNLEGLHPHDVATAMDYEGVAVRAG 362 

+E L++YVLPKL+AI+GLTIYGPS + G+I+FNL+ LHPHD+ATA+DYEGVAVRAG 
Sbjct: 301 HEAELIAYVLPKLEAIEGLTIYGPSQPSARSGLISFNLDDLHPHDLATALDYEGVAVRAG 360 

20 

Query: 363 HHCAQPLINHLGIHSAVRASFYFYNTKEDCDKLVDAIQKTKEFFNGTL 410 

HHCAQPL+++LG+ + VRASFY YNTK DCD+LV+AI K KEFFNGTL 
Sbjct: 361 HHCAQPLLSYLGVPATVRASFYIYNTKADCDRLVEAILKAKEFFNGTL 408 

25 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1860 

A DNA sequence (GBSxl967) was identified in S.agalactiae <SEQ ID 5781> which encodes the amino 
acid sequence <SEQ ID 5782>. Analysis of this protein sequence reveals the following: 

30 Possible site: 14 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 1441 (Affirmative) < suco 
35 bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB07189 GB:AP001518 unknown conserved protein [Bacillus halodurans] 
40 Identities = 171/430 (39%) , Positives = 267/430 (61%) , Gaps = 15/430 (3%) 

Query: 1 MSKEAILNFLQAKGEPTWLQELRLKAFEKIEELELPVIERVKFHRWNLG--DGTILENDY 58 

+ KE + +F A+ EP W +++RLK FE +E LELP ++ K WN D + E 
Sbjct: 9 IDKEYVQSFSDARNEPQWFKDIRLKGFELVETLELPKPDKTKITSWNFTNFDHKLPEVSP 68 

Query: 59 TANVPDFTE LGNNPKLVQIGTQTVLEQVPMELIEKGWFTDFYSALEEIPE 109 

A++ + + LVQ V ++ Li KGV+FTD +A++E + 

Sbjct: 69 VASIDELRDEVKGLIGEASDTQNLLVQRDATWYSKLDEALKAKGVIFTDLLTAVKEHGD 128 

50 Query: 110 VIERYFGK-ARPFEEDRLAAYHTAYFNSGAVLYIPDNVEITQPIEGLFYQDSQSKVPFNK 168 

++E+Y+ K A +E+RL AHA N G +Y+P NVEI P++ +F+ D++ FN 
Sbjct: 129 LvEKYYMKDAVKvDENRLTALHAALVNGGTFIYvPRNVEIEVPLQSVFWFDTEKAGLFN- 187 

Query: 169 HILLIVGKNAKVSYLERFESIGDGTERTSANISVEVIAQAGSQIKFASIDRLGENVTTFI 228 
55 " ~ H++++ N+ ++Y+E + S G +E ANI VEV A A +++ F ++D L VTT++ 

Sbjct: 188 HVI IVAEDNSS ITYVENYASFG- -SEEAVANIVTOVFAGANAKVSFGAVDNLAAG VTTYV 245 

Query: 229 SRRGRHSSDATIDWALGVMNEGNWADFDSDLIGKSSHANLKVVAASSGRQVQGIDTRVT 288 
RR D+ ++WALG MN+GN V++ + L+GD S A+ K V+ G Q Q T++ 

60 Sbjct: 246 VRRAHVGRDSRVEWALGQMNDGNTVSENTTHLIX3DNSWADTKTVSVGRGEQKQNFTTQIF 305 

Query: 289 NYGCMSVGHILQHGVILERGTLTFNGIGHIIKGAKGADAQQESRVLMLSDKRRSDANPIL 348 



45 
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++G +S G+IL+HGV+ E T FNGI I GA + +Q RVLMLS+KAR DANPIL 
Sbjct: 306 HHGKHSEGYILKHGVMREAATSIFNGISKIEHGATKSHGEQTERVLMLSEKARGDANPIL 365 

Query: 349 LIDENDVTAGHAAS IGQVDPEDLYYLMSRGLNQKTAEQLVIRGFLGTVIAEI PVKEVRDE 408 

LIDE+DVTAGHAAS+G:t-+DP ++YLMSRG+++ AE+LVI GFL V+ ++P++ V++ 
Sbjct: 366 LIDEDDVTAGHAASVGKIDPIQMFYLMSRGISRAEAERLVIHGFLAPWGQLPIESVKER 425 

Query: 409 MIAVIDTKLE 418 

++ 1+ K++ 
Sbjct: 426 LVEAIERKVK 435 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5783> which encodes the amino acid 
sequence <SEQ ID 5784>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -0.80 Transmembrane 387 - 403 ( 387 - 403) 

Final Results 

bacterial membrane Certainty=0 . 1319 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB15259 GB:Z99120 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 177/428 (41%) , Positives = 267/428 (62%) , Gaps = 15/428 (3%) 

Query: 3 KEKLVAFSQAHAEPAWLQERRIAALEAIPNLELPTIERVKFHRWNLGDGT--LTENESLA 60 

+E L +FS+ H EPAWL+ RL ALE +L +P ++ K WN + +NE L+ 

Sbjct: 11 QEYLKSFSEKHQEPAWLKNLRLQALEQAEDLPMPKPDKTKITNWNFTNFAKHTVDNEPLS 70 

Query: 61 SVPDF IAIGDNPKLVQVGTQTVLEQLPMA--LIDKGWFSDFYTALEEIPEVI 111 

S+D II+K + V L++L DKGV+F+D TA E +++ 

Sbjct: 71 SLEDLTDEVKALIDIENEDKTLYVQRDQTPAHLSLSQELKDKGVIFTDILTAAREHSDLV 130 

Query: 112 EAHFGQ-ALAFDEDKLAAYHTAYFNSAAVLYVPDHLEITTPIEAIFLQDSDSDVPFNKHV 170 

E+F+ + DEKLAHA N A LYVP ++++ TP++A+++ +S+ FN HV 
Sbjct: 131 EKYFMKDGVKVDEHKLTALHAALWGGAFLYTOKNVQVETPVQAVYVHESNDTALFN-HV 189 

Query: 171 LVIAGKESKFTYLERFESIGNATQKISANISVEVIAQAGSQIKFSAIDRLGPSVTTYISR 230 

L++A S TY+E + S N + NI EVI + + + A+D L VTTY++R 
Sbjct: 190 LIVAEDHSS VTYVENYI STVNPKDAVF-NI ISEVITGDNASVTYGAVDNLSSGVTTYVNR 248 

Query: 231 RGRLE-KDANIDWALAVMNEGNVIADFDSDLIGQGSQADLKWAASSGRQVQGIDTRVTN 289 

RG +D+ I+WAL +MN+G+ I++ ++L G G+ D K V G Q + T++ + 
Sbjct: 249 RGAARGRDSKIEWALGLMNDGDTISEOTTNLYGDGTYGDTKTVWGRGEQTENFTTQIIH 308 

Query: 290 YGQRTVGHILQHGVILERGTLTFNGIGHILKDAKGADAQQESRVLMLSDQARADANPILL 349 

+G+ + G+IL+HGV+ + + FNGIG I A A+A+QESRVLMLS++AR DANPILL 
Sbjct: 309 FGKASEGYILKHGvMKDSASSIFNGIGKIEHGASKANAEQESRVLMLSEKARGDANPILL 368 

Query: 350 IDENEVTAGHAASIGQVDPEDMYYLMSRGLDQETAERLVIRGFLGAVIAEIPIPSVRQEI 409 

IDE++VTAGHAAS+G+VDP +YYLMSRG+ +E AERLVI GFL V+ E+PI V++++ 
Sbjct: 369 IDEDDVTAGHAASVGRVDPIQLYYLMSRGIPKEEAERLVIYGFLAPVVNELPIEGVKKQL 428 

Query: 410 IKVLDEKL 417 

+ V++ K+ 
Sbjct: 429 VSVIERKV 436 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 322/420 (76%) , Positives = 368/420 (86%) 



Query: 1 MSKEAI LNFLQAKGEPTWLQELRLKAFEKIEELELP VTERVKFHRWNLGDGT I LENDYTA 60 
M+KE ++ F QA EP WLQE RL A E I LELP IERVKFHRWNLGDGT+ EN+ A 
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Sb j ct : 


1 


MTKEKLVAFSQAHAEPAWLQERRIAALEAIPmELPTIERVKFHRWNLGDGTLTENESLA 


60 




61 


OTPDFTELGNNPKLVQIGTQTVLEQVPMELIERGWFTDFYSALEEIPEVIERYFGKARP 


120 






+VPDF +G+NPKLVQ+GTQTVLEQ+PM LI+KGWF+DFY+ALEEIPEVIE +FG+A 




Sb j ct : 


61 


SVPDFI AIGDNPKLVQVGTQTVLEQL PMAL I DKGWFSDFYTALEE I PEVIEAHFGQALA 


120 




121 


FEEDRLAAYHTAYFNSGAVLYIPDNVEITQPIEGLFYQDSQSKVPFNKHILLIVGKNAKV 


180 






F+ED+LAAYHTAYFNS AVLY+PD++EIT PIE +F QDS S VPFNKH+L+I GK +K 




Sb j ct : 


121 


FDEDKLAAYHTAYFNSAAVLYVPDHLEITTPIEAIFLQDSDSDVPFNKHVLVIAGKESKF 


180 


Queiry: 


181 


SYLERFESIGDGTERTSANISVEVIAQAGSQIKFASIDRLGENVTTFISRRGRHSSDATI 


240 






+YLERFESIG+ T++ SANI SVEVIAQAGSQIKF++IDRLG +VTT+ 1 SRRGR DA I 




Sb j ct : 


181 


TYLERFESIGNATQKISANISVEVIAQAGSQIKFSAIDRLGPSVTTYISRRGRLEKDANI 


240 




241 


DWALGVMffiGNWADFDSDLIGDGSHANLI<VVARSSGRQVQGIDTRVTNYGCNSVGHILQ 


300 






DWAL VMNEGNV+ADFDSDLIG GS A+LKWAASSGRQVQGIDTRVTNYG +VGHILQ 




Sb j Ct : 


241 DWALAVI^G]WIM)FDSDLIGQGSQADLKOTAASSGRQVQGIDTRVTNYGQRTVGHILQ 


300 


Query: 


301 


HGVILERGTLTFNGIGHIIKGAKGADAQQESRVLMLSDKARSDAMPILLIDENDVTAGHA 


360 






HGVI LERGTLTFNG IGH I + K AKGADAQQESRVLMLSD+AR+DANPILLIDEN+VTAGHA 




Sb j ct : 


301 HGVILERGTLTFNGIGHILKDAKGADAQQESRVLMLSDQARADANPILLIDENEVTAGHA 


360 


Query: 


361 ASIGQVDPEDLYYLMSRGLNQKTAEQLVIRGFLGTVIAEIPVKEVRDEMIAVIDTKLEKR 


420 






ASIGQVDPED+YYLMSRGL+Q+TAE+LVIRGFLG VIAEIP+ VR E+I V+D KL R 




Sbjct: 


361 ASIGQVDPEDMYYLMSRGLDQETAERLVIRGFLGAVIAEIPIPSVRQEIIKVLDEKLLNR 


420 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1861 

A DNA sequence (GBSxl968) was identified in S.agalactiae <SEQ ID 5785> which encodes the amino 
acid sequence <SEQ ID 5786>. This protein is predicted to be ABC transporter, ATP-binding protein, 
Ycfl6 family. Analysis of this protein sequence reveals the following: 

Possible site: 59 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2253 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15260 GB:Z99120 similar to ABC transporter (ATP-binding 
protein) [Bacillus subtilis] 
Identities = 180/250 (72%) , Positives = 212/250 (84%) 



Query: 


2 


SVLEIKNLHVSIEDKEILKGLNLTLKTGEIAAIMGPNGTGKSTLSAAIMGNPNYEVTAGE 


61 






S L IK+LHV IE KEILKG+NL +K GE A+MGPNGTGKSTLSAAIMG+P YEVT G 




Sbj ct: 


4 


STLTIKDLHVEIEGKEILKGvNLEIKGGEFHAVMGPNGTGKSTLSAAIMGHPKYEVTKGS 


63 


Query: 


62 


ILFDGEDILELEVDERARLGLFLAMQYPSEVPGITNAEFIRAAMNAGKADDDKISIRQFI 


121 






I DG+D+LE+EVDERA+ GLFLAMQYPSE+ G+TNA+F+R+A+NA + + D+IS+ +FI 




Sbj ct : 


64 


ITLDGKDVLEMEVDERAQAGLFLAMQYPSEISGVTNADFLRSAINARREEGDEISLMKFI 


123 


Query: 


122 


TKLDEKMELLGMKEEMAERYLI^GFSGGEKKRNEILQLLMLEPKFALIiDEIDSGLDIDAL 


181 






K+DE ME L M EMA+RYLNEGFSGGEKKRNEILQL+M+EPK A+LDEIDSGLDIDAL 




Sbj Ct : 


124 


RKMDENMEFLE^PEMAQRYI^GFSGGEKKRNEILQLMMIEPKIAILDEIDSGLDIDAL 


183 


Query: 


182 


KOTSKGVNEMRGEGFGAMIITHYQRLLNYITPDKVHW^ 


241 






KWSKG+N+MR E FG ++ ITHYQRLLNYITPD VHVMM G+W SGG ELA RLE EGY 




Sbj ct : 


184 


KVVSKGINK^SENFGCLMITHYQRLLNYITPDVvHVMMQGRWKSGGAELAQRLEAEGY 


243 
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Query: 242 AQIAEELGLE 251 

I +ELG+E 
Sbjct: 244 DWIKQELGIE 253 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5787> which encodes the amino acid 
sequence <SEQ ID 5788>. Analysis of this protein sequence reveals the following: 

Possible site: 48 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 24 17 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 225/255 (88%), Positives = 241/255 (94%) 



Query: 


1 


MSVLEIKNLHVSIEDKEILKGLNLTLKTGEIAAIMGPNGTGKSTLSAAIMGNPNYEVTAG 


60 






MS+LEI NLHVSIE KEILKG+NLTLKTGE+AAIMGPNGTGKSTLSAAIMGNPNYEVT G 




Sbjct: 


1 


MSILEINNLHVSIEGKEILKGVNLTLKTGEVAAIMGPNGTGKSTLSAAIMGNPNYEVTQG 


60 


Query: 


61 


E I LFDGEDI LELEVDERARLGLFLAMQYPSEVPGITNAEF I RAAMNAGKADDDKI S I RQF 


120 






+IL DG +IL+LEVDERARLGLFLAMQYPSE+PGITNAEF+RAAMNAGKAD+DKIS+R F 




Sbjct: 


61 


QILLDGVNILDLEVDERARLGLFLAMQYPSEIPGITNAEFMRAAMNAGKADEDKISVRDF 


120 


Query: 


121 


ITKLDEKMELLGMKEEMAERYLNEGFSGGEKKRNEILQLLMLEPKFALLDEIDSGLDIDA 


180 






ITKLDEKM LLGMKEEMRERYLlffiGFSGGEKICRNEILQLLMLEPKFAIiIiDEIDSGLDIDA 




Sbjct: 


121 


ITKLDEKMAIZjGMKEEMAERYLNEGFSGGEKKRI^IIiQLLMLEPKFALLDEIDSGIiDIDA 


180 


Query: 


181 


LKWS KGVNEMRGEGFGAMI I THYQRLLNYITPDKVHVMMIXSKVVLSGGPEIiAVRLEKEG 


240 






LKWSRGVNEMRG+ FGAMIITHYQRLLNYITPD VHVMMDG++VLSG LA RLEKEG 




Sbjct: 


181 


LKVVSKGVNEMRGKDFGAMIITHYQRLLNYITPDLVHVl^ 


240 


Query: 


241 


YAQ1AEELGLEYKEE 255 








YA IA++LG+EYKEE 




Sb j ct : 


241 


YAGIAQDLGIEYKEE 255 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1862 

A DNA sequence (GBSxl969) was identified in S.agalactiae <SEQ ID 5789> which encodes the amino 
acid sequence <SEQ ID 5790>. This protein is predicted to be RgpG (rfe). Analysis of this protein sequence 
reveals the following: 

Possible site: 40 

>>> Seems to have an uncleavable N-term signal seq 



INTEGRAL 


Likelihood 




■12. 


.10 


Transmembrane 


312 


- 328 


( 


308 


- 336) 


INTEGRAL 


Likelihood 




■10. 


.03 


Transmembrane 


15 


- 31 


( 


6 


- 41) 
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Likelihood 




-9. 


.82 


Transmembrane 


205 


- 221 


( 


197 


- 226) 
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Likelihood 




-8. 


.60 


Transmembrane 


335 


- 351 


( 


329 


- 358) 


INTEGRAL 


Likelihood 




-7. 


.48 


Transmembrane 


257 


- 273 


< 


255 


- 281) 
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Likelihood 




-5. 


.52 


Transmembrane 


60 


- 76 


( 


56 


- 79) 
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Likelihood 




-5. 


.31 


Transmembrane 


151 


- 167 


( 


148 


- 171) 
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Likelihood 




-4. 


.88 


Transmembrane 


91 


- 107 


( 


90 


- 108) 
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Likelihood 




-4. 


.78 


Transmembrane 


184 


- 200 


( 


177 


- 203) 
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Likelihood 




-3. 


.13 


Transmembrane 


119 


- 135 


( 


119 


- 135) 


INTEGRAL 


Likelihood 




-2. 


.97 


Transmembrane 


229 


- 245 


( 


229 


- 250) 



Final Results 

bacterial membrane Certainty=0. 5840 (Affirmative) < suco 
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bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8919> which encodes amino acid sequence <SEQ ID 8920> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
McG: Discrim Score: 5.18 
GvH: Signal Score (-7.5) : -6.19 

Possible site: 15 
>» Seems to have an uncleavable N-term signal seq 
ALOM program count: 9 value: -12.10 threshold: 0.0 
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Likelihood 




■12. 
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239 - 


255 


( 


235 


- 263) 
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Likelihood 




-9. 


.82 


Transmembrane 


132 - 


148 


( 


124 


- 153) 


INTEGRAL 


Likelihood 




-8. 


.60 


Transmembrane 


262 - 


278 


( 


256 


- 285) 


INTEGRAL 


Likelihood 




-7. 


.48 


Transmembrane 


184 - 


200 


( 


182 


- 208) 


INTEGRAL 


Likelihood 




-5. 


.31 


Transmembrane 


78 - 


94 


( 


75 


- 98) 
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Likelihood 




-4. 


.88 


Transmembrane 


18 - 


34 


( 


17 


- 35) 
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Likelihood 




-4. 


.78 


Transmembrane 


111 - 


127 


( 


104 


- 130) 
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Likelihood 




-3. 


,13 


Transmembrane 


46 - 


62 


( 


46 


- 62) 
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Likelihood 




-2. 


.97 


Transmembrane 


156 - 


172 


( 


156 


- 177) 


PERIPHERAL 


Likelihood 




12. 


.63 


284 













modified ALOM score : 2.92 



*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 5840 (Affirmative) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA82114 GB:AB022909 RgpG [Streptococcus mutans] 
Identities = 266/382 (69%) , Positives = 317/382 (82%) 

Query: 10 TIEYIFVLIGAFLLSIILTPIIRVISLKVGAVDKPNARRINKVPMPSSGGLAIFLSFVVT 69 

T++++ VLI L S++LTP++R +L+VGAVD PNARRINKVPMPS+GGLAI +SFV+ 
Sbjct: 7 TLKFVLVLIATLLTSLVLTPLVRFFALRVGAVDNPNARRINKVPMPSAGGLAIIISFVIA 66 

Query: 70 TLFFMPMAASRHFIEVSYFHYILPVIIGGLWTTTGFIDDIFELRPRYKMLGIIIAAIII 129 

TL MPM SYF YILPV++G LV+ TGFIDD++EL P+ K LGI++ A+II 

Sbjct: 67 TLALMPMILKTQIGGKSYFEYILPWLGALVIALTGFIDDVYELSPKIKFLGILLGAVII 126 

Query: 130 WKFTHFRFDSFKIPIGGPLLEFGPILTFFLTVLWIISITNAINLIDGLDGLVSGVSIISL 189 

W FT FRFDSFKIP GGP+L F P L+FFLT+LW+++ITNA+NLIDGLDGLVSGVS+ISL 
Sbjct: 127 WIFTDFRFDSFKIPFGGPMLHFNPFLSFFLTILWWAITNAVNLIDGLDGLVSGVSMISL 186 

Query: 190 ATMAWSYFFLPKIDFFLTLTIVILIASIVGFFPYNYHPAIIYLGDAGALFIGFMIGVLS 249 

TM +VSYFFL D FLTLTI +LI +1 GFFPYNYHPAI I YLGD GALFIGFMI VLS 
Sbjct: 187 TTMGLVSYFFLYDTDIFLTLTIFVLIFAIAGFFPYNYHPAIIYLGDTGALFIGFMISVLS 246 

Query: 250 LQGLKNSTAVAVITPVIILGVPILDTAVAIVRRKLSGKKISEADKMHLHHRLLSMGFTHR 309 

LQGLKN+TAVAV+TP+ 1 +LGVPI+DT VAI+RR LSG+K EAD MHLHHRLL+MGFTHR 
Sbjct: 247 LCGLKNATAVAVVTPIIVLGVPIVDTTOAIIRRTLSGQKFYFADNMHLHHRLLAMGFTHR 306 

Query: 310 GAVLVVYGIAIIFSLIALLLNVSSRIGGIFLLLALLLAMEIFIEGIiNIWGENRTPLFNLL 369 

GAVLWYGIA+ FSL++LLLNVSSR+GGI L++ + A+EIFIEGL IWG RTPLF LL 
Sbjct: 307 GAVLVWGIAMFFSLVSLLLNVSSRLGGILLMIGVAFALEIFIEGLEIWGPKRTPLFRLL 366 

Query: 370 KFIGNSDYRQSVIAKYSDKHQK 391 

FIGNSDYRQ V+AKY K +K 
Sbjct: 367 AFIGNSDYRQEWAKYRRKKKK 388 



A related DNA sequence was identified in S.pyogenes <SEQ ID 579 1> which encodes the amino acid 
sequence <SEQ ID 5792>. Analysis of this protein sequence reveals the following: 
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Possible site: 32 



10 



Seems to 


have an uncleavable N- 


term signal seq 
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Likelihood 


— 


-8.28 


Transmembrane 


9 


- 25 


1 - 


33) 
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Likelihood 


= 


-8 


17 


Transmembrane 


201 


- 217 


198 - 


221) 
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Likelihood 




-7 


64 


Transmembrane 


308 


- 324 


305 - 


329) 
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Likelihood 




-7 


17 


Transmembrane 


55 


- 71 


51 - 


74) 
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Likelihood 




-7 


06 


Transmembrane 


145 


- 161 


( 138 - 


170) 
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Likelihood 




-6 


58 


Transmembrane 


260 


- 276 


( 251 - 


278) 
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Likelihood 




-6 


21 


Transmembrane 


180 


- 196 


( 172 - 


198) 


INTEGRAL 


Likelihood 




-5 


95 


Transmembrane 


331 


- 347 


( 330 - 


353) 


INTEGRAL 


Likelihood 




-5 


68 


Transmembrane 


87 


- 103 


( 82 - 


104) 


INTEGRAL 


Likelihood 




-3 


93 


Transmembrane 


113 


- 129 


( 112 - 


133) 


INTEGRAL 


Likelihood 




-2 


60 


Transmembrane 


233 


- 249 


( 232 - 


250) 



15 



Final Results 

bacterial membrane -- 

bacterial outside -• 

bacterial cytoplasm -■ 



- Certainty=0. 4312 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



20 The protein has homology with the following sequences in the databases: 

>GP:BAA82114 GB:AB022909 RgpG [Streptococcus mutans] 
Identities = 289/381 (75%), Positives = 334/381 (86%), Gaps = 1/381 (0%) 



25 



30 



35 



40 



45 



50 



Query: 5 TIDYVLVLIGALLMSLFLTPLVRFLAFRVGAVDNPNARRvNKVPMPTSGGLAIFMSFLVA 64 

T+ +VLVLI LL SL LTPLVRF A RVGAVDNPNARR+NKVPMP++GGLAI +SF++A 
Sbjct: 7 TLKFVLVLIATLLTSLVLTPLVRFFALRVGAVDNPNARRINKVPMPSAGGLAIIISFVIA 66 

Query: 65 SLGLIPIASKGAMFFGQTYFSYILPWIGATVITLTGFLDDLYELSPKLKMFGILIGAVI 124 

+L L+P+ K G++YF YILPW+GA VI LTGF+DD+YELSPK+K GIL+GAVT 

Sbjct: 67 TLALMPMILK-TQIGGKSYFEYILPvVLGMVIALTGFIDDvYELSPKIKFLGILLGAVI 125 

Query: 125 VWAFTDFKFDSFKIPFGGPLLVFGPFLTLFLTVLWIVSITNAINLIDGLDGLVSGVSIIS 184 

+W FTDF+FDSFKIPFGGP+L F PFL+ FLT+LW+V+ITNA+NLIDGLDGLVSGVS+IS 
Sbjct: 126 IWIFTDFRFDSFKIPFGGPMLHFNPFLSFFLTILWVVAITNAVNLIDGLDGLVSGVSMIS 185 

Query: 185 LVTMAIVSYFFLPQKDFFLTLTILVLISAIAGFFPYNYHPAMIYLGDTGALFIGFMIGVL 244 

L TM +VSYFFL D FLTLTI VLI AIAGFFPYNYHPA+IYLGDTGALFIGFMI VL 
Sbjct: 186 LTTMGLVSYFFLYDTDIFLTLTIFVLIFAIAGFFPYNYHPAIIYLGDTGALFIGFMISVL 245 

Query: 245 SLQGLKNSTAVAWTPVIILGVPIMDTIVAIIRRSLSGQKFYEPDKMHLHHRLLSMGFTH 304 

SLQGLKN+TAVAWTP+ 1 +LGVPI+DT VAI IRR+LSGQKFYE D MHLHHRLL+MGFTH 
Sbjct: 246 SLQGLKNATAVAWTPI I VLGVPIVDTTVAI IRRTLSGQKFYEADNMHLHHRLLAMGFTH 305 

Query: 305 RGAVLWYGITMLFSLISLLLNVSSRIGGVLLMLGLLFGLEVFIEGLEIWGEKRTPLFNL 364 

RGAVLWYGI M FSL+SLLLNVSSR+GG+LLM+G+ F LE+FIEGLEIWG KRTPLF L 
Sbjct: 306 RGAVLVWGIAMFFSLVSLLIOTSSRLGGILLMIGVAFALEIFIEGLEIWGPKRTPLFRL 365 

•Query: 365 LKFIGNSDYRQAMLLKWKEKK 385 
L FIGNSDYRQ ++ K++ KK 
Sbjct: 366 LAFIGNSDYRQEWAKYRRKK 386 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 282/384 (73%) , Positives = 334/384 (86%) , Gaps = 1/384 (0%) 

55 Query: 6 MIPFTIEYIFVLIGAFLLSIILTPIIRVISLKVGAVDKPNARRINKVPMPSSGGLAIFLS 65 

M FTI+Y+ VLIGA L+S+ LTP++R ++ +VGAVD PNARR+NKVPMP+SGGLAIF+S 
Sbjct: 1 MFSFTIDYVLVLIGALLMSLFLTPLWFLAFRVGAVDNPNARRVNKVPMPTSGGLAIFMS 60 

Query: 66 FWTTLFFMPMAAS-RHFIEVSYFHYILPVIIGGLWTTTGFIDDIFELRPRYKMLGIII 124 
60 F+V +L +P+A+ F +YF YILPV+IG V+T TGF+DD++EL P+ KM GI+I 

Sbjct: 61 FLVASLGLIPIASKGAMFFGQTYFSYILPWIGATVITLTGFLDDLYELSPKLKMFGILI 120 

Query: 125 AAIIIWKFTHFRFDSFKIPIGGPLLEFGPiLTFFLTVLWIISITNAINLIDGLDGLVSGV 184 
A+I+W FT F+FDSFKIP GGPLL FGP LT FLTVLWI+SITNAINLIDGLDGLVSGV 
65 Sbjct: 121 GAVIWAFTDFKFDSFKIPFGGPLLVFGPFLTLFLTVLWIVSITNAINLIDGLDGLVSGV 180 
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Query: 185 SIISLATMAWSYFFLPKIDFFLTLTIVILIASIVGFFPYNYHPAIIYLGDAGALFIGFM 244 

SIISL TMA+VSYFFLP+ DFFLTLTI++LI++I GFFPYNYHPA+IYLGD GALFIGFM 
Sbjct: 181 SIISLVTMA1VSYFFLPQKDFFLTLTILVLISAIAGFFPYHYHPAMIYLGDTGALFIGFM 240 

Query: 245 IGVLSLQGLKNSTAVAVITPVIILGVPILDTAVAIVKEIKLSGKKISEADKMHLHHRLLSM 304 

IGVLSLQGLKNSTAVAV+TPVI ILGVPI+DT VAI+RR LSG+K E DKMHLHHRLLSM 
Sbjct: 241 IGVLSLQGLKNSTAVAVVTPVIILGVPIMDTIVAIIRRSLSGQKFYEPDKMHLHHRLLSM 300 

Query: 305 GFTHRGAVLWYGIAIIFSLIALLLNVSSRIGGIFLLLALLLAMEIFIEGLNIWGENRTP 364 

GFTHRGAVLWYGI ++FSLI+LLLNVSSRIGG+ L+L LL +E+FIEGL IWGE RTP 
Sbjct: 301 GFTHRGAVLWYGITMLFSLISLLLNVSSRIGGVLLMLGLLFGLEVFIEGLEIWGEKRTP 360 

Query: 365 LFNLLKFIGNSDYRQSVIAKYSDK 388 

LFNLLKFIGNSDYRQ+++ K+ +K 
Sbjct: 361 LFNLLKFIGNSDYROAMLLKWKEK 384 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1863 

A DNA sequence (GBSxl970) was identified in S.agalactiae <SEQ ID 5793> which encodes the amino 
acid sequence <SEQ ID 5794>. This protein is predicted to be negative regulator of genetic competence. 
Analysis of this protein sequence reveals the following: 
Possible site: 16 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm — Certainty=0. 3460 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9483> which encodes amino acid sequence <SEQ ID 9484> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA82113 GB:AB022909 negative regulator of genetic competence 
[Streptococcus mutans] 
Identities = 168/248 (67%) , Positives = 205/248 (81%) , Gaps = 9/248 (3%) 

Query: 1 MEMKQISETTLKITISMEDLEDRGMELKDFLIPQEKTEEFFYSVMDELDLPENFKNSGML 60 

MEMKQISETTLKITISMEDLE+RGMELKDFLIPQEKTEEFFY+VMDELDLPENFK SGML 
Sbjct: 1 MEMKQISETTLKITISMEDLEERGMELKDFLIPQEKTEEFFYTVMDELDLPENFKGSGML 60 

Query: 61 SFRVTPKKDRIDVFVTKSELSKDimEELADLGDISKMSPEDFFKTLEQSMLEKGDTDAH 120 

SFRVTP+ DRIDVFVTKSE++K+LNLE+L+D DISKMSPEDFF TLE++M EKGD A 
Sbjct: 61 SFROTPRNDRIDVFVTKSEINKNLNLEDLSDFDDISKMSPEDFFNTLEETMREKGDAAAL 120 

Query: 121 AKLAEIENIWDKATQEVvEENVSEEQPEKEVETIGYVHYVFDFDNIFAVVRFSQTIDFPI 180 

KLAEIE ++ TQ+ E+ ++E+ + YVH+V DF NI+ V+ F++T+D+ + 

Sbjct: 121 DKLAEIEKREEEKTQQ- -EKGETKEKRD YVHFVLDFPNIQQVISFAKTVDYDV 171 

Query: 181 EASELYKNGKGYHMTILLDLENQPSYFANLMYARMLEHANVGTKTRAYLKEHSIQLIHDD 240 

EASEL+K YHMT+LL+LE++P Y+A+LM+ARMLEHA GTKTRAYL EH +QLI D 
Sbjct: 172 EASELFKESDAYHMTVLLNLEDKPDYYADLMFARmiEHAGRGTKTRAYLLEHGVQLIKAD 231 

Query: 241 AISKLQMI 248 

A+ +LQMI 
Sbjct: 232 ALQELQMI 239 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 5795> which encodes the amino acid 
sequence <SEQ ID 5796>. Analysis of this protein sequence reveals the following: 

Possible site,: 18 

>>> Seems to have no N-terminal signal sequence 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 171/253 (67%), Positives = 209/253 (82%), Gaps = 2/253 (0%) 



Query: 


1 


MEMKQISETTLKITISMEDLEDRGMELKDFLIPQEKTEEFFYSVMDELDLPENFKNSGML 


60 






MEMKQISETTLKITISM+DLE+RGMELKDFLIPQEKTEEFFYSVMDELDLP+NFK+SGML 




Sb j ct : 


3 


MEMKQISETTLKITISMDDLEERGMELKDFLIPQEKTEEFFYSVMDELDLPDNFKDSGML 


62 


Query: 


61 


SFRVTPKKDRIDVFVTKSELSKDLNLEELADLGDISKMSPEDFFKTLEQSMLEKGDTDAH 


120 






SFRVTP+KDR+DVFVTKSE++KD+NLE+LA+ GD+S+M+PEDFFK+LEQSM EKGD AH 




Sb j ct : 


63 


SFRVTPRKDRLDVFVTKSEINKDINLEDLAEFGDMSQMTPEDFFKSLEQSMREKGDVKAH 


122 


Query: 


121 


AKLAEIENMMDKATQEW- - EENVSEEQPEKEVETIGYVHYVFDFDNIEAWRFSQTIDF 


178 






KL +IE +M+ + + + ++ E E + YVHYV DF I V F++TIDF 




Sb j ct : 


123 


EKLEKIEEI^DVVEATLANQSEAM)PSTMIESEPLDYVHYVXiDFSTITEAVAFAKTIDF 


182 


Query: 


179 


PIEASELYKNGKGYHMTILLDLENQPSYFANLMYARMLEHANVGTKTRAYLKEHSIQLIH 


238 






IEASELYK YHMTILLD++ QPSYFAN+MYAR++EHAN G+KTRAYL+EH +QL+ 




Sb j ct : 


183 


SIEASELYKGSNCYHMTILLDVQQQPSYFANVMYARLIEHANPGSKTRAYLQEHGLQLML 


242 


Query: 


239 


DDAISKLQMIEMG 251 








D A+ +LQ IE+G 




Sb j ct : 


243 


DGAVEQLQKI ELG 255 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1864 

A DNA sequence (GBSxl971) was identified in S.agalactiae <SEQ ID 5797> which encodes the amino 
acid sequence <SEQ ID 5798>. This protein is predicted to be BacA (bacA). Analysis of this protein 
sequence reveals the following: 

Possible site: 17 

>>> Seems to have no N-terminal signal sequence 
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Final Results 



bacterial cytoplasm Certainty=0. 33 07 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0. 4609 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD50462 GB:AF169967 BacA [Flavobacterium johnsoniae] 
Identities = 101/275 (36%) , Positives = 165/275 (59%) , Gaps = 22/275 (8%) 



Query: 7 



LKALFLGVVEGOTEWLPVSSTGHLILVQEFMKLNQSKSFVEMFNIVIQLGAIMAVIVIYF 6 6 
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L+A+ L V+EG+TE+LPVSSTGH+I+ • F + + F ++F IVIQLGAI + +V+V+YF 
Sbjct: 4 LQAIVLAVIEGITEFLPVSSTGHMIIASSFFGIAH-EDFTKLFTIVIQLGAILSVWLYF 62 

Query: 67 KRIaNPFQPGKSAREIRLTWQLWLKWIACIPSILIALPFDNWFEAHFNFMIPIAIALIFY 126 
5 KR FQ T + K+++A IP++++ L ++ + + +A++L+ 

Sbjct: 63 KRF--FQ- TLDFYFKLLVAFIPAWLGLLLSDFIDGLLENPVTVAVSLLIG 110 



Query: 127 GFVFI WVEKRNAHLKPQVTELASMSYKTAFLIGCFQVLSIVPGTSRSGATILGAII 182 

G + + W NA Q ++Y A IG FQ ++++PG SRSGA+I+G + 

10 Sbjct: 111 GLILLKVDEWFNNPNAAETSQ KITYLQALKIGLFQC I AMI PGVSRSGAS IVGGMS 165 



15 



Query: 183 IGTSRSVAADFTFFLAIPTMFGYSGLKAVKYFIiDGNVLSLDQSLILLVASLTAFWSLYV 242 

SR+ AA+F+FFLA+PTM G + K Y+ G LS DQ IL++ ++ AF+V+L 
Sbjct: 166 QKLSRTTAAEFSFFIAVPTMLGATVKKCYDYYKAGFELSHDQVNILIIGNWAFIVALLA 225 

Query: 243 IRFLTDYVKRHDFTIFGKYRIVLGSLLILYWLWH 277 

1+ ++ ++ F +FG YRI+ G +L+L +H 
Sbjct: 226 IKTFISFLTKNGFKVFGYYRI IAGI ILLLIHFFIH 260 

20 A related DNA sequence was identified in S.pyogenes <SEQ ID 5799> which encodes the amino acid 
sequence <SEQ ID 5800>. Analysis of this protein sequence reveals the following: 

Possible site: 56 
>>> Seems to have no N-terminal signal sequence 

25 , , 
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Final Results 

bacterial membrane Certainty=0. 5522 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0.0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAD50462 GB:AF169967 BacA [Flavobacterium johnsoniae] 
Identities = 102/269 (37%) , Positives = 169/269 (61%) , Gaps = 14/269 (5%) 

40 Query: 7 LKAIFFGIIEGITEWLPVSSTGHLILVQEFIRLNQDKAFIEMFNIVIQLGAIIAVMLIYF 66 

L+AI +IEGITE+LPVSSTGH+I+ F + + F ++F IVIQLGAI++V+++YF 
Sbjct: 4 LQAIVLAVTEGITEFLPVSSTGHMI IASSFFGIAHED-FTKLFTIVIQLGAILSWVLYF 62 

' Query: 67 ERLNPFQPGKTAREVQLTWQLWLKWIACIPSILIAVPLDNWFEAHFYFMVPIAIALIVY 126 
45 +R FQ T + K+++A IP++++ + L ++ + V +A++L++ 

Sbjct: 63 KRF--FQ TLDFYFKLLVAFIPAVVLGLLLSDFIDGLLENPVTVAVSLLIG 110 

Query: 127 GIAFIWIEKRNAQQEPAVTELARMSYKTAFFIGCFQvLSIVPGTSRSGATILGAIILGTS 186 
G+ + +++ A T +++Y A IG FQ ++++PG SRSGA+I+G + S 

50 Sbjct: 111 GLILLKVDEWFNNPNAAETS - QKITYLQALKIGLFQCIAMI PGVSRSGAS IVGGMSQKLS 169 

Query: 187 RTVAADFTFFLAIPTMFGYSGLKAVKFFLDGHHLDFAQVLILLVASLTAFVVSLLAIRFL 246 

RT AA+F+FFLA+PTM G + K ++ G L QV IL++ ++ AF+V+LLAI+ 
Sbjct: 170 RTTAAEFSFFLAVPTMLGATVKKCYDYYKAGFELSHDQVNILIIGNWAFIVALLAIKTF 229 

55 

Query: 247 TDYVKKHDFTIFGKYRIVLGSLLLIYSFF 275 

++ K+ F +FG YRI+ G +LL+ FF 
Sbjct: 230 I SFLTKNGFKVFGYYRI IAGI I LLLIHFF 258 

60 An alignment of the GAS and GBS proteins is shown below. 

Identities = 227/272 (83%) , Positives = 253/272 (92%) 



Query: 1 MLIIELLKALFLGWEGVTEWLPVSSTGHLILVQEFMKLNQSKSFVEMFNIVIQLGAIMA 60 
MLI IELLKA+F G++EG+TEWLPVSSTGHLILVQEF++LNQ K+ F+EMFNI VIQLGAI +A 
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Sbjct: 1 MLIIELLKAIFFGIIEGITEWLPVSSTGHLILVQEFIRLNQDKAFIEMFNIVIQLGAIIA 60 

Query: 61 VIVIYFKRIaNPFQPGKSAREIRLTWQLWLKWIACIPSILIALPFDNWFEAHFNFMIPIA 120 

V++IYF+RLNPFQPGK+ARE++LTWQLWLKWIACIPSILIA+P DNWFEAHF FM+PIA 
Sbjct: 61 VMLIYFERLNPFQPGKTAREVQLTWQLWLKWIACIPSILIAVPLDNWFEAHFYFMVPIA 120 

Query: 121 IALIFYGFVFIWVEKRNAHLKPQVTELASMSYKTAFLIGCFQVLSIVPGTSRSGATILGA 180 

IALI YG FIW+EKRNA +P VTELA MSYKTAF IGCFQVLS I VPGTSRSGAT I LGA 
Sbjct: 121 IALIVYGIAFIWIEKRNAQQEPAVTELARMSYKTAFFIGCFQVLSIVPGTSRSGATILGA 180 

Query: 181 1 1 IGTSRSVAADFTFFIAI PTMFGYSGLKAVKYFLDGNVLSLDQSLILLVASLTAFWSL 240 

1 1 +GTSR+VAADFTFFLAI PTMFGYSGLKAVK+FLDG+ L Q LILLVASLTAFWSL 
Sbjct: 181 1 1 LGTSRTVAADFTFFLAI PTMFGYSGLKAVKFFLDGHHLDFAQVL I LLVASLTAFWSL 240 

15 Query: 241 YVIRFLTDYVKRHDFTIFGKYRIVLGSLLILY 272 

IRFLTDYVK+HDFTI FGKYRIVLGSLL++Y 
Sbjct: 241 LAIRFLTDYVKKHDFTI FGKYRIVLGSLLLIY 272 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
20 vaccines or diagnostics. 

Example 1865 

A DNA sequence (GBSxl972) was identified in S.agalactiae <SEQ ID 5801> which encodes the amino 

acid sequence <SEQ ID 5802>. Analysis of this protein sequence reveals the following: 

Possible site: 42 
25 »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -8.65 Transmembrane 494 - 510 ( 488 - 519) 

INTEGRAL Likelihood = -8.01 Transmembrane 263 - 279 ( 256 - 288) 

INTEGRAL Likelihood = -5.95 Transmembrane 25 - 41 ( 20 - 43) 

INTEGRAL Likelihood = -4.94 Transmembrane 475 - 491 ( 473 - 493) 

30 



35 



50 



. Final Results 

bacterial membrane Certainty=0 .4461 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9481> which encodes amino acid sequence <SEQ ID 9482> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB99606 GB:U67598 M. jannaschii predicted coding region MJ1577 
40 [Methanococcus jannaschii] 

Identities = 41/172 (23%), Positives = 78/172 (44%), Gaps = 19/172 (11%) 

Query: 479 LISFWIIYTLFLNYFTYFCIYLLLFGVILLLNKIIFMMTRKISNGYIVTEDGASRVYQW 538 
+IS ++ ++ F+ ++ + ++ ++ II +T G ++ +W 

45 Sbjct: 442 VI S ILLAVFLYFI PKYSQTFNEVFYLS I VFVVQNI ILALTPTSLFGRWKANYYKEKL - EW 500 

Query: 539 TSFRNMLRDIKSFDRSELESIVLWNRILVYATLFGYADRVEKALR-VNQIDIPERFANID 597 

+F+N L ++ + E I +W L+Y T G D+V +A++ +N ++ +1 
Sbjct: 501 DAFKNFLSNIAMIKKYSPEDISIWKIJWLIYGTALGVGDKVVFJU4KSLNLSELVADYVIIH 560 



Query: 598 SHQFAI SVNQS SNHFSTITEDVSHASNFSVNSGGSSGGFSGGGG - - GGGGGA 647 

S+ ++ + S + ST GS GGF GGG GGGGGA 

Sbjct: 561 SNYDSMKTSVDSVYSSTT GSGGGFGAGGGFGGGGGGA 597 



55 A related DNA sequence was identified in S.pyogenes <SEQ ID 5803> which encodes the amino acid 
sequence <SEQ ID 5804>. Analysis of this protein sequence reveals the following: 



Possible site: 21 
>» Seems to have a cleavable N-term signal seq. 
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INTEGRAL Likelihood = -7.91 Transmembrane 486 - 502 ( 483 - 508) 

INTEGRAL Likelihood = -5.89 Transmembrane 465 - 481 ( 460 - 483) 

INTEGRAL Likelihood = -2.18 Transmembrane 244 - 260 ( 241 - 260) 



5 Final Results 

bacterial membrane Certainty=0 .4163 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

10 The protein has homology with the following sequences in the databases: 

>GP:AAB99606 GB:U67598 M. jannaschii predicted coding region MJ1577 
[Methanococcus j annaschii] 
Identities = 59/263 (22%) , Positives = 106/263 (39%) , Gaps = 14/263 (5%) 

15 Query: 369 FLDMAFGNKOTLPVDQLFSQYHYDADTIKQLKKTYKGKKLEQEVRQSSEQVIKAMKKASA 428 

++ + G K+ + L + Y++D +K L K K + E +S Q K+ K 
Sbjct: 346 YIKII#TGGKIEILKTDLENLDVYESDvMKFLMKYSKNNVFDPEYIKSLAQKYKSSKDKLK 405 

Query: 429 AITNNVLETIKKLNLPDTYRQMTPA- -EKRKSNSVQGLGCLLLILNSGLLIYLAIKESGL 486 
20 ++EK+P ++AER + L+ ++L L ++ 

Sbjct: 406 KLKD ELDKIMEYPRYSSKWNAFLETRGKKIIIALLVISILLAVFLYFIPKYSQTFN 462 

Query: 487 ALIYLALMVLTMCLGFYISLKLDQYKKLGIETPEGGVRLHQWQSFKNMIRDIDKFEDVAI 546 
+ YL+++ + ILL G +W +FKN + ++ + + 

25 Sbjct: 463 EVFYLS I VFWQ NIILALTPTSLFGRWKANYYKEKLEWDAFKNFLSNLAMIKKYSP 518 

Query: 547 EGLVVWNRVLWATLFGYAKKVERYLKVHRIALPEVYQAVRPGELSMVMYATTPTFVSSL 606 

E + +W L+Y TGKV+K++ +V + Y+TVS+ 

Sbjct: 519 EDISIWKDWLIYGTALGVGDKWEAMKSLNLS ELVADYVI IHSNYDSMKTSVDS V 573 

30 

Query: 607 SSATTSSNFSVSSGGGISGGGGG 629 

S+TT S +GGG GGGGG 

Sbjct: 574 YSSTTGSGGGFGAGGGFGGGGGG 596 

35 An alignment of the GAS and GBS proteins is shown below. 

Identities = 241/635 (37%) , Positives = 372/635 (57%) , Gaps = 18/635 (2%) 

Query: 22 MKKCFLAI CLALSFFMVSVQADEVDYNI PHYEGNLTIHNDNSADFTEKVTYQFDSSYNGQ 81 
MKK + + L S + ++A +VDY+I +YEG L + +N+A F +KVTYQFD+SYNGQ 
40 Sbjct: 1 MKKILMTLVLCFSLLGIRIKAADVDYSITNYEGQLLLSKENTARFEQKVTYQFDTSYNGQ 60 

Query: 82 YVTLGTAGKLPDNFDINNKPQVEVSINGKVRKVSYQIEDLEDGYRLKVFNGGEAGDTVKV 141 

Y++LG G LP F 1+ KP+VEV NG+ VS + DL DGYRLK++N G+AGD V V 
Sbjct: 61 YISLGRTGHLPAGFAIDQKPKVEVYQNGQQVPVSQEFSDLGDGYRLKLYNAGQAGDKVDV 120 

45 

Query: 142 NVQWKLKNVLFMHKDVGELNWIPISDWDKTLEKVDFWISTDKKVALSRLWGHLGYL-KTP 200 

V W+L ++L ++DV ELNW PISDWDKTLEKV ++T + S LW H GY K P 
Sbjct: 121 KVIWQLHHLLTAYQDVAELNWTPISDWDKTLEKVSLTVTTPTDIQDSNLWAHRGYYQKKP 180 

50 Query: 201 PKIRQNNNRYHLTAFNVNKRLEFHGYWDRSYF- -NLPTNSKNNYKKKIEYQEKMIERHGF 258 

+++ N+RY + A NV+ +LE H YWD+ P + + K KI E I R 

Sbjct: 181 QVLKEGNSRYQINAKNVSGQLELHAYWDKKALLGKEPVDVSTSKKNKIVALETKISRRRT 240 

Query: 259 ILSFLLRILLPSFFIIVTLFISIRVFLFRKKVNKYGQFPKEHHLYEAPEDLSPLELTQSI 318 
55 +L L ++P + L+ 1+ +K+ N+Y H YE PEDLSPL LTQ+I 

Sbjct: 241 LLQLLFGKVIPLVEVGFLLWQLIQFTRLKKQFNRYHLANHTDHSYEVPEDLSPLVLTQAI 300 

Query: 319 YSMSFKNFQ DEEKKTHL 1 SQEQLIQSILLDLIDRKVL NYDDNLLSLANLD 368 

Y SF E +K + ++ E L+Q+ LLDLID+KVL L ++ LD 

60 Sbjct: 301 YGQSFAYLSPTASESQKLLIPKGVTFEALVQATLLDLIDQKVLLLTKEEGKAYLEISQLD 360 

Query. 369 RASDAEIDFIEFAFADSTSLKPDQLFSNYQFSYKETLRELKKQHKASDLQTQMRRRGSNA 428 

R +D E F++ AF + +L DQLFS Y + +T+++LKK +K L+ ++R+ 
Sbjct: 361 RVTDEFAAFLD^FGNKVTLPVBQLFSQYHYD-ADTIKQLKICrYKGKKLEQEvRQSSEQV 419 



65 



Query: 429 LSRITRLTRLISKDNINSLRRKGISSPYRKMSSEESKELSRLKRFSYLSPLISFWIIYT 488 
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+ + + + 1+ + + ++++ + YR+M+ E ++ + ++ L +++ ++IY 
Sbjct: 420 IKMKKASAAITNNVLETIKKLNLPDTYRQMTPAEKRKSNSVQGLGCLLLILNSGLLIY- 478 

Query: 489 LFI^FTYFCIYLLLFGVILLLNKIIFMMTRKISNGYIVTEDGASRVYQWTSFRNMI.RDI 548 
5 L + IYL L + + L 1+ + IT+G R++QW SF+NM+RDI 

Sbjct: 479 LAIKESGLALIYLALMVLTMCLGFYISLKLDQYKKLGIETPEGGWLHQWQSFKNMIRDI 538 

Query: 549 KSFDRSELESIVLWNRILVYATLFGYADRVEKALRVNQIDIPERFANIDSHQFAISVNQS 608 
F+ +E +V+WNR+LVYATLFGYA +VE+ L+V++I +PE + + + ++ + + 
10 Sbjct: 539 DKFEDVAIEGLVVWNRVLWATLFGYAKKOTRYIiKVHRIALPEVYQAVRPGELSMVMYAT 598 

Query: 609 SNHFSTITEDVSHASNFSVNSGGSSGGFSGGGGGG 643 

+ F + + +SNFSV+SG GG SGGGGGG 

Sbjct: 599 TPTFVSSLSSATTSSNFSVSSG- - -GGISGGGGGG 630 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

A related GBS gene <SEQ ID 8921> and protein <SEQ ID 8922> were also identified. Analysis of this 
protein sequence reveals the following: 

20 Lipop: Possible site: -1 Crend: 4 

McG: Discrim Score: 10.29 
GvH: Signal Score (-7.5): 3.11 

Possible site: 23 
>>> Seems to have a cleavable N-term signal seq. 
25 ALOM program count: 3 value: -8.65 threshold: 0.0 

INTEGRAL Likelihood = -8.65 Transmembrane 475 - 491 ( 469 - 500) 
INTEGRAL Likelihood = -8.01 Transmembrane 244 - 260 ( 237 - 269) 
INTEGRAL Likelihood = -4.94 Transmembrane 456 - 472 ( 454 - 474) 
PERIPHERAL Likelihood = 2.28 540 
30 modified ALOM score: 2.23 

*** Reasoning Step: 3 

Final Results 

35 bacterial membrane Certainty=0 . 4461 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has no homology with any sequences in the databases. 



40 Example 1866 

A DNA sequence (GBSxl973) was identified in S.agalactiae <SEQ ID 5805> which encodes the amino 
acid sequence <SEQ ID 5806>. This protein is predicted to be glutamine-binding periplasmic 
, protein/glutamine transport system perme. Analysis of this protein sequence reveals the following: 

Possible site: 24 
45 >>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -8.86 Transmembrane 301 - 317 ( 295 - 324) 
INTEGRAL Likelihood = -6.05 Transmembrane 479 - 495 ( 473 - 496) 
INTEGRAL Likelihood = -0.59 Transmembrane 369 - 385 ( 369 - 385) 

50 Final Results 

bacterial membrane Certainty=0. 4545 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

55 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA17584 GB:D90907 glutamine-binding periplasmic protein 
[Synechocystis sp.] 

Identities = 147/534 (27%) , Positives = 256/534 (47%) , Gaps = 75/534 (14%) 
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Query: 4 ILLSLFTALLITFGGMTSIQADEYLRVGMEAAYAPFNWTQMDNTNGAVPIEGTDQYANGY 63 

+LL++ LL F .++ + + V E + PF T E T Q G+ 

Sbjct: 24 VLLAIAI PLLPAFSQVSR QTI IVATEPTFPPFEMTD EATGQLT-GF 68 

5 

Query.: 64 DVQVAKKIAKKLNKKVVVVKTKWEGLVPALTSGKLDMIIAGMSPTEERKKEINFSKPYYI 123 

DV + + + + V + ++G++PAL S + 1+ ++ T ER + ++FS PY+ 

Sbjct: 69 DVDLIQAIGEAAQVTVDIQGYPFDGIIPALQSNTVGAAISAITITPERAQSVSFSSPYFK 128 

10 Query: 124 SEPTLVVNAEGKYMAKNISDFKNAKOTAQQGVYLYNLIDQINGVKKEVAMGDFNQLRQA 183 
S L + + KN+ D+++ G ++GK + +F+ + A 
Sbjct: 129 S--VLAIAVQDGNDTIKNLKDLEGKRLAVAIGTTGAMVATNVPGAK VTNFDSITSA 182 

Query: 184 VE SGWDAYVSERPDATSAQTANPKLKMIELHQGFKTSDADTNISVGMRKGDNRINQ 240 

15 ++ +G DA +++RP A + L+ +++ + D I++ + INQ 

Sbjct: 183 LQELTOGNADAVINDRPVLLYA-IKDAGLRNVKISADVGSEDY-YGIAMPIAP-PGEINQ 239 

Query: 241 VNQVL ESISRDKQIALMDKMIKEQ PSV KKEKNGK 274 

+VL +1 A+ +K E+ PS+ + + N 

20 Sbjct: 240 TREVLNQGLFQIIENGTYNAIYEKWFGEKNPPFLPLVAPSLVGKVGTAQSLTERSQANPN 299 

Query: 275 PNFFEQMATILKNNGSQFLRGTATTLLISMVGTIVGLFIGLLIGVFRTAPKSDNKLKAAL 334 

NF + T+ +N +G+ T+L++ GL G + + A SD 
Sbjct: 300 DNF LITLFRN LFKGSILTVLLTAFSVFFGLIGGTGVAI ALISD 342 



25 

Query: 335 QKLLGWLLNIYIEVFRGTPMIVQSMVIYYGTAQAF GVSLDRTLAAIFIVSINTGA 389 

K L + IY+E FRGTPM+VQ +IY+G F G+++DR AAI +S+N A 

Sbjct: 343 IKPLQLIFRIYVEFFRGTPMLVQLFIIYFGLPALFKEIGLGITIDRFPAAIIALSLNvAA 402 

30 Query: 390 YMSEIVRGGIFSVDKGQFEAATALGFTHGQTMRKIVLPQWRNILPATGNEFVINIKDTS 449 

Y++E1+RGGI S+D+GQ+EA +LG + QTM++++ PQ R ILP GNEF+ IKDTS 
Sbjct: 403 YLAEIIRGGIQS1DQGQWEACESLGMSPWQTMKEVIFPQAFRRILPPLGNEFITLIKDTS 462 

Query: 450 VLWISVVELYFSGNTVATQTYQYFQTFTIIAIIYFILTFTVTRILRYIEKRFD 503 
35 + VI EL+ G + TY+ F+ + +A++Y +LT + + +++E D 

Sbjct: 463 LTAVIGFQELFREGQLIVATTYRAFEVYIAVALVYLLLTTISSFVFKWLENYMD 516 

There is also homology to SEQ ID 1 194. 

A related GBS gene <SEQ ID 8923> and protein <SEQ ID 8924> were also identified. Analysis of this 
40 protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 2 
McG: Discrim Score: 6.23 
GvH: Signal Score (-7.5): 0.11 
Possible site: 24 
45 >» Seems to have a cleavable N-terra signal seq. 

ALOM program count: 3 value: -8.86 threshold: 0.0 

INTEGRAL Likelihood = -8.86 Transmembrane 301 - 317 ( 295 - 324) 
INTEGRAL Likelihood = -6.05 Transmembrane 479 - 495 ( 473 - 496) 
PERIPHERAL Likelihood = 1.32 441 
50 modified ALOM score: 2.27 

*** Reasoning Step: 3 

Final Results 

55 bacterial membrane Certainty=0. 4545 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

60 34.3/57.3% over 462aa 

Synechocystis PCC6803 

EGAD 1 48193 1 glutamine -binding periplasmic protein/glutamine transport system permease 
protein Insert characterized 
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GP | 1652664 |dbj |BAA17584.l| |D90907 glutamine -binding periplasmic protein {Synechocystis 
sp . } Insert characterized 

PIR|S77250|S77250 hypothetical protein - Synechocystis sp. (strain PCC 6803) Insert 
characterized 

ORF01242(454 - 1809 of 2148) 

EGAD|48193|slll270(54 - 516 of 530) glutamine -binding periplasmic protein/glutamine 
transport system permease protein {Synechocystis PCC6803}GP | 1652664 | dbj | BAA17584 . 1 | |D90907 
glutamine -binding periplasmic protein {Synechocystis sp. }PIR | S77250 | S77250 hypothetical 
protein - Synechocystis sp. (strain PCC 6803) 
%Match =12.3 

%Identity =34.2 %Similarity = 57.2 

Matches = 128 Mismatches = 149 Conservative Sub.s = 86 



204 234 264 294 324 354 384 414 

PSWCIPF*HKOTimFQ*DNDIEIDLVFR*NRRK*LIGGC*MKKILLSLFTALLITXGGMTSIQADEYLRVGMEAAYAP 

MKGMVKLGHWGKTWRYYLLLALGVLLA.IAI PLLPAFSQVS 
10 20 30 40 



444 474 495 525 555 585 615 645 

FNWTQNDNTNGAVPIEGTDQ- - -YANGYDVQVAKKLAKKLNKKVVWKTKWEGLVPALTSGKLDMI I AGMS PTEERKKE I 

I I ||: hll : : : : I = -l-lll I = 1= = = I II = = 

RQTIIVATEPTFPPFEMTDEATGQLTGFDVDLIQAIGEAAQVTVDIQGYPFDGIIPALQSNTVGAAISAITITPERAQSV 

50 60 70 80 90 100 110 120 



675 705 735 765 795 825 855 885 

NFSKPYYISEPTLVWAEGKYTNAKNISDFKNAKVTAQQGVYLYNLIDQINGVKKEVAMGDFNQLRQAVESGVVDAYVSE 

=11 11= I = I =1 I 11= l== == 1 = = I I : l== I =1 II === 

SFSSPYFKSVIAIAVQ-IXSNOT-IKNLKDLEGKRIAVAIGTTGftMVA 

130 140 150 160 170 180 190 



903 957 987 

RP--- DATSAQTANPKLK-MIELHQG- FKTSDADTNISV 

II I III =11= III 

RPVLLYAI KDAGLRNVKI SADV NPPFLPLVAPSLVGKVGTAQSLTERSQANPNDNFLITLFRNLFKGS- - 

210 270 280 290 300 310 

1017 1047 1077 1107 1137 1167 1197 1227 

G^KGDNRINQWQVLESISRDKQIALMDKMIKEQPSVOEKNGKPNFFEQMATILKNNGSQFLRGTATTLLISMVGTIV 

::::: | 

ILTVLLTAF 

320 



1257 1284 1314 1344 1374 1404 1419 1449 
GLFIGLLIGV-FRTAPKSDNKLKAALQKLLGWLLNIYIEVFRGTPMIVQSMVIYYGTAQAF GVSLDRTLAAIFIV 

=1 11= I I II II - 11=1 111111=11 =11=1 I |::=ll III = 

SVFFGLIGGTGVAIALISD IKPLQLIFRIYVEFFRGTPMLVQLFI I YFGLPALFKEIGLGITIDRFPAAI IAL 

340 350 360 370 380 390 

1479 1509 1539 1569 1599 1629 1659 1689 

SINTGAYMSEIWGGIFSVDKGQFEAATALGFTHGQTMRKIVLPQVVR^^ 

1=1 ll==ll=llll 1=1=11=11 =11 = lll=====ll I III 1111= 11111= II 11= I 
SIiWAAYLAEIIRGGIQSIDQGQWEACESLGMSPWQTMKEVIFPQAFRRILPPLGNEFITLIKDTSLTAVIGFQELFREG 
410 420 430 440 450 460 470 



1719 1749 1779 1809 1839 1869 1899 1929 

OTTOTQTYQYFQTFTIIAIIYFILTFTVTRILRYIEKSFDSDNYTTGANQLQV*EVGMTQAILEIKHLKKSYGSNEVLKD 

= ||= 1= = =l==l==ll = ===:=l I 
QLIVATTYRAFEWIAVALVYLLLTTISSFVFKWLENYMDPIGRAKKKAKRATA 
490 500 510 520 530 

There is also homology to SEQ ID 5804. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1867 

A DNA sequence (GBSxl974) was identified in S.agalqctiae <SEQ ID 5807> which encodes the amino 
acid sequence <SEQ ID 5808>. This protein is predicted to be ATP-binding. Analysis of this protein 
sequence reveals the following: 

5 Possible site: 44 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0.3208 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 





Query: 


5 


20 


Sb j ct : 


1 




Query: 


65 


25 


Sb j Ct : 


61 




Query: 


125 




Sb j ct : 


121 


30 


Query : 


185 




Sb j ct : 


180 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB73160 GB:AL139076 putative glutamine transport ATP-binding 
15 protein [Campylobacter jejuni] 

Identities = 132/241 (54%) , Positives = 178/241 (73%) , Gaps = 1/241 (0%) 

ILEIKHLKKSYGSNEVLKDISLSVNKGEVISIIGSSGSGKSTFIiRSINLLEEPSGGEILY 64 
++E+K+L+K YG EVLK+I+ +++KG+VI+IIG SG GKSTFLR IN LE GEIL 
MIEVKNLQKKYGELEVLKNINTTISKGDVIAIIGPSGGGKSTFLRCINRLELADSGEILI 60 



+ N+L+K D+N R+K+ MVFQ FNLF N N++EN + ++EA K AK 



L VG+ ++ P +LSGGQKQR+AIAR+L +NP+ ILFDEPTSALDPEM+GEVL M 



+D+AK GLTM+ 4-VTHEM FA+ V++R+ FMDKG IA +PK++FENP+ ER +EFL + L 
KDVAKEGLTMLWTHEMGFARNVANRI FFMDKGKIATOASPKEVFENPSNERLREFLNKVL 240 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2157> which encodes the amino acid 
35 sequence <SEQ ID 2158>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

>>> Seems to have no N-terminal signal sequence 

Final Results 

40 bacterial cytoplasm Certainty=0 . 1170 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

45 Identities = 212/246 (86%) , Positives = 237/246 (96%) 

Query: 1 MTQAILEIKHLKKSYGSNEVLKDISLSVNKGEVISIIGSSGSGKSTFLRSINLLEEPSGG 60 

M+ +I+EIK+LKKSYGSNEVLKDISLSVNKGEVISIIGSSGSGKST LRSINLLEEPS G 
Sbjct: 24 MSNSIIEIKNLKKSYGSNEVLKDISLSVNKGEVISIIGSSGSGKSTLLRSINLLEEPSAG 83 

50 

Query: 61 EILYHGHNVLEKGYDLNNYREKLGIWFQSFNLFENI^ILENAIVAQTTVLKRERQEAEKI 120 

+ IL+HG +VL + Y+L +YREKLGMVFQSFNLFENLN+LENAIVAQTTVLKR+R +AE+I 
Sbjct: 84 QILFHGEDVLAEHYNLTHYREKIGMVFQSFNL^^ 143 

55 Query: 121 AKENLNAVGMTEQYWKAKPKQLSGGQKQRVAIARALSVNPEAILFDEPTSALDPEMVGEV 180 

AKENIiNAVGMTEQYW+AKPKQLSGGQKQRVAIARALSVNPEA+LFDEPTSALDPEMVGEV 
Sbjct: 144 AKENLNAVGMTEQYWQAKPKQLSGGQKQRVAIARALSVNPEAMLFDEPTSALDPEMVGEV 203 



Query: 181 LKTMQDLAKSGLTMI IVTHEMEFAKEVSDRVI FMDKGI IAEQGTPKQLFENPTQERTKEF 240 
60 LKTMQDIAKSGLTMIIVTHEMEFA++VSDR+IFMDKG+I E+G+P+Q+FENPTQ+RTKEF 

Sbjct: 204 LKTMQDLAKSGLTMIIVTHEMEFARDVSDRIIFMDKGLITEEGSPQQIFENPTQDRTKEF 263 
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Query: 241 LQRFLK 246 

LQRFLK 
Sbjct: 264 LQRFLK 269 

5 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1868 

A DNA sequence (GBSxl976) was identified in S.agalactiae <SEQ ID 5809> which encodes the amino 
10 acid sequence <SEQ ID 5810>. This protein is predicted to be hypersensitive-induced response protein. 
Analysis of this protein sequence reveals the following: 

Possible site: 31 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-17.94 Transmembrane 4 - 20 ( 1 - 28) 

15 

Final Results 

bacterial membrane Certainty=0 . 8175 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9479> which encodes amino acid sequence <SEQ ID 9480> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF68390 GB:AF236374 hypersensitive- induced response protein 
25 [Zea mays] 

Identities = 127/275 (46%) , Positives = 174/275 (63%) , Gaps = 1/275 (0%) 

Query: 19 ITSLYWKQQTVAIIERFGKyQKTATSGIHIRVPLGIDKIAARVQLRLLQSEIIVETKTK 78 
I L V Q TVAI E FGK+ + G H +IA + LR+ Q ++ ETKTK 

30 Sbjct: 4 ILGLVQVDQSTVAIKENFGKFSEVLEPGCHFLPWCIGQQIAGYLSLRVRQLDVRCETKTK 63 

Query: 79 DNVFVTLNIATQYRVNENNVTDAYYKLIKPEAQIKSYIEDALRSSVPKLTLDELFEKKDE 138 

DNVFVT+ + QYR + +DA+YKL QI+SY+ D +R++VPKL LD+ FE+K+E 

Sbjct: 64 DNVFVTvVASVQYRALADKASDAFYKLSNTREQIQSYVFDVIRATVPKLGLDDAFEQKNE 123 

35 

Query: 139 IALEVQHQVAEEMSTYGYIIVKTLITKVEPDAEVKQSMNEINAAQRKRVAAQELANADKI 198 

IA V+ ++ + MSTYGY IV+TLI +EPD VK++MNEINAA R RVAA E A A+KI 
Sbjct: 124 IAKAVEEELEKAMSTYGYQIVQTLIVDIEPDDRVKRAMNEINAAARMRVAASEKAEAEKI 183 

40 Query: 199 KIVTAAEAEAEKDRLHGVGIAQQRKAIVDGIiADSIQELKDANVTLTEEQIMSILLTNQYL 258 

+ AE EAE L GVGIA+QR+AIVDGL DS+ + T + IM ++L QY 

Sbjct: 184 LQIKKAEGEAESKYLAGVGIARQRQAIVDGLRDSVLAFSENVPGTTAKDIMDMVLVTQYF 243 

Query: 259 DTLNTF-AINGNQTIFLPNNPEGVEDIRTQVLSAL 292 
45 DT+ A + + ++F+P+ P V+D+ Q+ L 

Sbjct: 244 DTMREIGASSKSSSVFIPHGPGAVKDVSAQIRDGL 278 

A related DNA sequence was identified in S.pyogenes <SEQ ID 581 1> which encodes the amino acid 
sequence <SEQ ID 5812>. Analysis of this protein sequence reveals the following: 

50 Possible site: 32 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-13.06 Transmembrane 5 - 21 ( 1 - 29) 

Final Results 

55 bacterial membrane Certainty=0. 6222 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the databases: 

>GP:AAF68390 GB:AF236374 hypersensitive-induced response protein 
[Zea mays] 

Identities = 126/273 (46%) , Positives = 174/273 (63%) , Gaps = 3/273 (1%) 



Query : 


23 


LYVVRQQib VAI VERFGRYQKTATSGI H I RL P £ G I - DKIAARVQLRLLUfaliXl Vliilvli^Ivj 


81 






L V Q +VAI E FG++ + G H LP+ I +IA + LR+ Q ++ ETKTKDN 




Sb j ct : 


7 


LVQVDQSTVAIKENFGKFSEVLEPGCHF-LPWCIGQQIAGYLSLRVRQLDWCETKTKDN 


65 


Query: 


82 


VFVTIJWATQYRVNEQNVTDAYYKLM 


141 






VFVT+ + QYR +DA+YKL QI+SY+ D +R++VPKL LD+ FE+K+EIA 




Sb] Ct : 


66 


Vr V 1 V VAfaVQYRALiADKASDAb YKbfaJM lKhiU±QSYVfciJVlKAl VPKJjCjbJJlJAi? JiUisJNJiiA 


J.ZD 


Query: 


142 


LEVQHQVAEEMSTYGYIIVKTLITKVEPDAEVKQSMNEINAAQRKRVAAQELANADKIKI 


201 






V+ ++ + MSTYGY IV+TLI +EPD VK++MNEINAA R RVAA E A A+KI 




Sb j ct : 


126 


KAWEELEKAMSTYGYQIVQTLI VDIEPDDRVKRAMNEINAAARMRVAASEKAEAEKILQ 


185 


Query: 


202 


OTAAEAFJffiKDRLHGVGIAQQRKAIVTDGLAESIQELKEANISLNEEQIMSILLTNQYLDT 


261 






+ AE EAE L GVGIA+QR+AIVDGL +S+ E + IM ++L QY DT 




Sb j ct : 


186 


IKKAEGEAESKYLAGVGIARQRQAIVIIGLRDSVIAFSENVPGTTAKDIMDMVLVTQYFDT 


245 


Query: 


262 


LNTFAAKG-NQTLFLPNTPSGVEDIRTQVLSAL 293 








+ A + ++F+P+ P V4D+ Q+ L 




Sbjct: 


246 


MREIGASSKSSSVFIPHGPGAVKDVSAQIRDGL 278 





An alignment of the GAS and GBS proteins is shown below. 
Identities = 254/291 (87%) , Positives = 278/291 (95%) 



Query: 


5 


IILWILVLVIvLLITSLYvVKQQTVAIIERFGKYQRrATSGIHIRVPLGIDKIAARVQL 


64 






I + +++++ ++ ++LYW+QQ+VAI+ERFG+YQKTATSGIHIR+P GIDKIAARVQL 




Sb j ct : 


6 


IFIAFGVIVIIAIVASTLYWRQQSVAIVERFGRYQKTATSGIHIRLPFGIDKIAARVQL 


65 


Query: 


65 


RIiLQSEIIVETKTKDOTFVTLNIATQYRv^^ 


124 






RLLQSE 1 1 VETKTKDNVFVTLN+ATQYRVNE NVTDAYYKL+KPE+QIKSYIEDALRSSV 




Sb j ct : 


66 


RLLQSEIIVETKTKDNVFVTLNVATQYRvlffiQJNVTrftYYKLMKPESQIKSYIEDALRSSV 


125 


Query: 


125 


PKLTLDELFEKKDEIALEVQHQVAEEMSTYGYI IVKTLITKVEPDAEVKQSMNEINAAQR 


184 






PKLTLDELFEKKDEIALEVQHQVAEEMSTYGYIIVKTLITKVEPDAEVKQSMNEINAAQR 




Sb j ct : 


126 


PKLTLDELFEKKDEIALEVQHQVAEEMSTYGYIIVKTLITKVEPDAEVKQSMNEINAAQR 


185 


Query: 


185 


KRVAAQELANADKI KI VTAAEAEAEKDRLHGVGIAQQRKAI VDGLADS I QELKDANVTLT 


244 






KRVAAQELANADKIKIVTAAEAEAEKDRLHGVGIAQQRKAIVDGLA+SIQELK+AN++L 




Sb j ct : 


186 


KRVAAQEIANADKIKIWAAEAEAEKDRLHGVGIAQQRKAIVDGLAESIQELKEANISLN 


245 


Query: 


245 


EEQIMSILLTNQYLDTLNTFAINGNQTIFLPNNPEGVEDIRTQVLSALKTR 295 








EEQIMS ILLTNQYLDTTiNTFA GNQT+FLPN P GVEDIRTQVLSALKT+ 




Sb j ct : 


246 


EEQIMSILLTNQYLDTLNTFAAKGNQTLFLPNTPSGVEDIRTQVLSALKTK 296 





SEQ ID 5810 (GBS231) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 55 (lane 7; MW 60.9kDa). 

GBS231d was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is 
shown in Figure 155 (lane 5-7; MW 59kDa) and in Figure 239 (lane 11; MW 59kDa). It was also expressed 
in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 155 (lane 9; 
MW 34kDa) and in Figure 183 (lane 6; MW 34kDa). Purified GBS231d-GST is shown in Figure 246, lane 
8. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1869 

A DNA sequence (GBSxl977) was identified in S.agalactiae <SEQ ID 5813> which encodes the amino 
acid sequence <SEQ ID 5814>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0.2305 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9291> which encodes amino acid sequence <SEQ ID 9292> 
was also identified. 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB13457 GB:Z99112 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 259/514 (50%) , Positives = 350/514 (67%) , Gaps = 9/514 (1%) 



Query: 


1 


MGMTMENGAKEVSDKPATTVGEVGQILSKGVLMGARGNSGVITSQLFRGFGQSIKDKEEL 


60 






M ++M +GA+EV +G+VG LSKG+LMGARGNSGVI SQLFRGF ++I+ K+E+ 




Sbjct: 


45 


MNLSMTSGAREVEQMDTDDIGKVGSALSKGLLMGARGNSGVILSQLFRGFSKNIETKKEI 


105 


Query: 


61 


TGQDIiAHAFQNGVEVAYKAVMKPVEGT 


120 






+ A A y (jV + +Aii\Jivi v llS.b J VEitjl ILil v + 4- Ai\ A-f A_Ej+ JJ +1 V I A + A 




Sb j ct : 


106 


NALEFAAALQAGVDMAYKAVMKPVEGTILTVAKDAAKKAM1LAEICETDITALMTAVTEEA 


165 


Query: 


121 


KRAIAKTPDMLPvLKEVGvVDSGGQGLVFIYEGFLSALTGEYIASEDFKATPATMTE^lVN 


180 






+ +L +TP++LPVLKEVGWDSGG+GL+ +YEGFL++L GE + KA ++ +MV+ 




Sb j ct : 


166 


EASLNRTPELLPVLKEVGVVDSGGKGLLCVYEGFIiASLKGETVPQ- - -KAVLPSLDDMVS 


222 


Query: 


181 


AEHHKAWGHVATEDIKYGYCTEVMVGLKQGPTYVKEFNYEEFQGYLSNLGDSLLWNDD 


240 






AEHHK+ + TEDI++G+CTEVMV L Q +EF+ F+ LS GDSLLV+ D+ 




Sbjct: 


223 


AEHHKSAQSMMNTEDIEFGFCTEVMVRLDQTK- - -REFDEGTFRQDLSQFGDSLLVIADE 


279 


Query: 


241 


EIVKVHVHTEDPGLVMQEGLKYGSLVKVKVENMRNQHDA QMQKVEVEETVKETKEYG 


297 






+ KVH+H E+PG V+ YG L+K+K+ENMR QH + Q K ET + YG 




Sb j ct : 


280 


SIAKVHIHAEEPGNVLNYAQHYGELIKIKIENMREQHTSIISQESKPADNETPPAKQPYG 


339 


Query: 


298 


IIAWAGDGLAEIFKSQGTOYIISGGQTMNPSTEDIVTCAIEKVNARNVIILPNNKNIFMA 


357 






1+ V G+G+A++FKS G +1 GGQTMNPSTEDIV A++ VNA V ILPNN NI MA 




Sb j ct : 


340 


IVTVAMGEGIADLFKSIGASWIEGGQTr^PSTEDIVDAVKSVNADTVFILPNNSNIIMA 


399 


Query: 


358 


AQSAADVVDIPAAVVETRWPQGFTSLIAFDPAKSLETNVADMTNSLSDVISGSVTLAVR 


417 






A AA WD V+ +TVPQG ++BLAF+P + EN A+M +++ V SG VT +VR 




Sbjct: 


400 


ANQAASWDEQVFVIPAKTVPQGMSALIAFNPDQFAEANFJ^LSAIQQVKSGQVTFSVR 


459 


Query: 


418 


DTTIDGLEIHENDILGMVDGKILVSTPDMEKALKDTFDKMIDEDSEI VTIYVGEDGKQAL 


477 






DT IDG +1 + D +G+++G 1+ ++ + A K +MI ED EIVTI GED Q 




Sb j ct : 


460 


DTHIDGKDIKKGDFMGILNGTIIGTSENQLSAAKMLLSEMIGEDDEI VTILYGEDASQEE 


519 


Query: 


478 


AETLSEYLEETYEDVEVEIHQGDQPVYPYLMSVE 511 








AE L +L E YE++EVEIH G QP+Y Y++S E 




Sb j ct : 


520 


AEQLEAFLSEKYEEIEVEIHNGKQPLYSYIVSAE 553 





A related DNA sequence was identified in S.pyogenes <SEQ ID 5635> which encodes the amino acid 
sequence <SEQ ID 5636>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1816 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 
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bacterial outside Certainty^O . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 434/511 (84%) , Positives = 475/511 (92%) 



Query: 


1 


MGMTMENGAKEVSDKPATTVGEVGQILSRGVLMGARGNSGVITSQLFRGFGQSIKDKEEL 


60 






M MTM+NGAKEV+DKPA+TVGEVGQ+LSKG+LMGARGNSGVITSQLFRGFGQSIK K+EL 




Sb j ct : 


44 


MSMTMDNGAKEVADICPASTVGEVGOMLSKGLLMGARGNSGVITSOLFRGFGOSIKGKDEL 


103 


Query: 


61 


TGQDLAHAFQNGVEVAYKAVMKP VEGT I LTVSRGAATAALKKAEETDDAVEVMRATLKGA 


120 






TG+DLA AFQ GVEVAYKAVMKPVEGTILTVSRGAATAALKKA+ TDDAVEVM+A L GA 




S3d j ct * 


104 




163 


Query: 


121 


KRALAKTPDMLPVIjKEVGVVDSGGQGLVFIYEGFLSALTGEYIASEDFKATPATMTEMVN 


180 






K ALAKTPD+LPVLKEVGWDSGGQGLVFIYEGFLSAL G+Y+ S DFKATPA M+EM+N 




oDJ Ct . 






223 


Query: 


181 


AEHHKAWGHVATEDIKYGYCTEVWGLKQ(3PTYVKEENYEEFQGYLSNLGDSLLvVNDD 


240 






AEHHK+WGHVATEDI YGYCTE+MV LKQGPTYVKEFNY+EFQGYLS LGDSLLWNDD 






nn/ 
££*± 


fiJinJrlri.o v vbnvfiJ. CiUX l ikjiv_iC4JLi ¥ i v/iijJvyur'i i vjsjLi?iy luiLr x j-iouiijvjuoijij v vrjujj 




Query: 


241 


EIVKVHVHTEDPGLWQEGLKYGSLVKVKVENMRNQHDAQMQK^VEETVKETKEYGIIA 


300 






EIVKVHVHTEDPGLVMQEGLKYGSL+K+KV+NMRNQH+AQ+QK +VE+ E K++G+IA 




oDj ct: : 




ill viWn.vn.lrilJr\jljVl v iyi^^ 


TAT 


Query: 


301 


WAGDGLAEIFKSQGVDYIISGGQTMNPSTED1VKAIEKVNARNVIILPNNKNIFMAAQS 


360 






WAG+GL+EI FK+QGVDY+ 1 SGGQTMNPSTEDI VKAIE VNA+ VII LPNNKNI FMAAQS 




Sb j ct : 


344 


WAGEGLSEIFKAQGVDYVISGGQTMNPSTEDIVKAIEA.VNAKQVIILPNNKNIFMAAQS 


403 


Query: 


361 


AADVVDI PAAVVETRTVPQflFTSLLAFDPAKSLETNVADMTNSLSDVI SGS VTLAVRDTT 


420 






AA+WDIPAAW TRTVPQGFTSLLAFDP+KSLE NVADM+ SLSDV+SGSVTLAVRDTT 




Sb j ct : 


404 


A&E WD I PAAWATR WPC^FTSLLAFDPSKSLEDiraRDMSTSLSDWSGSVTIjAVRDTT 


463 


Query: 


421 


IDGLEIHENDILGMTOGKILVSTPDMEKALKDTFDKMIDEDSEIOTIYVGETCKQALAET 


480 






IDGLEIHEND LGMVDGKI+VS PDME LK F+KMIDEDSEIVTI+VGE+G Q LAE 




Sb j ct : 


464 


IDGLEIHENDFLGMVDGKIIVSNPDMEATLKAAFEKMIDEDSEIVTIFVGEEGDQDLAEE 


523 


Query: 


481 


LSEYLEETYEDVEVEIHQGDQPVYPYLMSVE 511 








L+ YL ETYEDVEVEIHQGDQPVYPYLMSVE 




Sbjct: 


524 


LAGYLGETYEDVEVEIHQGDQPVYPYLMSVE 554 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1870 

A DNA sequence (GBSxl978) was identified in S.agalactiae <SEQ ID 5815> which encodes the amino 
acid sequence <SEQ ID 5816>. Analysis of this protein sequence reveals the following: 

Possible site: 46 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4771 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1871 

A DNA sequence (GBSxl979) was identified in S.agalactiae <SEQ ID 5817> which encodes the amino 
acid sequence <SEQ ID 5818>. This protein is predicted to be proliferating-cell nucleolar antigen P120. 
Analysis of this protein sequence reveals the following: 

5 Possible site: 55 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3774 (Affirmative) < suco 

10 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9345> which encodes amino acid sequence <SEQ ID 9346> 
was also identified. 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC74905 GB:AE000278 putative nucleolar proteins [Escherichia 
coli K12] 

Identities = 87/229 (37%) , Positives = 128/229 (54%) , Gaps = 8/229 (3%) 

20 ' Query: 63 GKSIEHTTGLWSQEPAAQ--IVAQIAEPQEGMKVLDLAAAPGGKTTHLLSYLNNTGLLV 120 

G + EH +GL Y QE ++ + A A+ +V+D+AAAPG KTT + + +NN G ++ 

GSTAEHLSGLFYIQEASSMLPVAALFADGNAPQRVMDVAAAPGSKTTQISARMNNEGAIL 14 8 

SNEISNKRSKILVENVERFGARNVIVTNESSQRLAKCFNSFFDLIVFDGPCSGEGMFRKD 180 
25 +NE S R K+L N+ R G NV +T+ + FD 1+ D PCSGEG+ RKD . 



30 



35 



40 



45 



Query: 


63 


Sb j ct : 


89 


Query : 


121 


Sb j ct : 


149 


Query: 


181 


Sb j ct : 


209 


Query: 


240 


Sbjct: 


269 



P A++ W + E A QR+++ A L GG LVYSTCT + EENE V WL + Y 



++L L D+ G + + ++P + EG FVA+LR T++ 

\7EFLPLGDL- - FPGANKALTEEGFLHVFPQI YDCEGFFVARLRKTQA 3 15 

A related DNA sequence was identified in S. pyogenes <SEQ ID 5819> which encodes the amino acid 
sequence <SEQ ID 5820>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2316 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 213/311 (68%) , Positives = 254/311 (81%) , Gaps = 3/311 (0%) 

Query: 1 MKLPNEFIEKYQTILKDEAEAFFDSFEQKPISAYRTNPLKEKQLDFPNAIPSTPWGHYGK 60 
50 M LP EFI YQ IL E E F SF Q+P++A+R NPLK + F + IP+T WG+YGK 

Sbjct: 2 MSLPKEFINTYQAILGKELEDFtASFNQEPVNAFRINPLKNQLKTFEHPIPNTLWGYYGK 61 

Query: 61 ISGKSIEHTTGLWSQEPAAQIVAQIAEPQEGMKVLDLARAPGGKTTHLLSYLNNTGLLV 120 
+SGKS EH +GLVYSQEPAAQ+VAQ+A PQ+G +VLDLAARPGGK+THLL+YL+NTGLLV 
55 Sbjct: 62 LSGKSPEHVSGLWSQEPAaQWAQVAAPQKGSRVLDLAAaPGGKSTHLLAYLDNTGLLV 121 

Query: 121 SNEISNKRSKILVENVERFGARNVI VTNESSQRLAKCFNSFFDLIVFDGPCSGEGMFRKD 180 

SNEIS KRSK+LVEN+ERFGARNV + VTNES+ RLAK F+ +FD IVFDGPCSGEGMFRKD 
Sbjct: 122 SNEISKKRSKVLVENIERFGARNVWTNESADRLAKVFSHYFDTIVFDGPCSGEGMFRKD 181 
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Query: 181 PQAIQYWHKDYPTECAQLQRDILKEAIKMLaHQSILVYSTCTWSPEENEEVVNWLLQEYD 240 

P AIQYWH YP ECA+LQ+ IL++A+ ML GG L+YSTCTW+PEENE+W WLL+ Y 
Sbjct: 182 PDAIQYWHHGYPAECAKLQKSILEDAIAMLKPGGELIYSTCTWAPEENEDVVQWLLETYT 241 

Query: 241 YLELVD I PKLNGMVEGINVPQVARMYPHHFQGEGQFVAKLRDTRS KEAQKI KPKAQKIN- 299 

+LELVD+PKLNGMV GI +P+ ARMYPH +QGEGQFVAKL+D R +E ,Q K KA K N 
Sbjct: 242 FLELVDVPKLNGWSGIGLPETARMYPHRYQGEGQFVAKLKDKR-QEGQSTKLKAPKSNL 300 

Query: 300 -KMQLQLWQQF 309 

K QL+LW+ F 
Sbjct: 301 I KDQLRLWKMF 311 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 1872 

A DNA sequence (GBSxl980) was identified in S.agalactiae <SEQ ID 5821> which encodes the amino 
acid sequence <SEQ ID 5822>. Analysis of this protein sequence reveals the following: 

Possible site: 41 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4111 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC24940 GB:AF012285 unknown [Bacillus subtilis] 
Identities = 86/240 (35%) , Positives = 133/240 (54%) , Gaps = 10/240 (4%) 



Query: 


6 


DFAKQLVYKAGQFI KSEMQNTFDVEEKSRFDDLVTSLDKKTQKLLIQEI IQHYPDDNILA 


65 






+ AK+ + +AG I M + +E KS +DLVT++DK+T+K I I + +P IL 




Sbjct: 


9 


EIAKKWIREAGARITQSMHESLTIETKSNPNDLVTNIDKETEKFFIDRIQETFPGHRILG 


68 


Query: 


66 


EE- - -DBTOSPIAQGNVWVLDPIDGTWFIVQKDNFAVMIAYYEEGVGQFGIIYDVMADI 


122 






EE D + S +G VW++DPIDGT+NF+ Q+ NFA+ + +E G G+ G+IYDV+ D 




Sbjct: 


69 


EEGQGDKIHS--LEGWWIIDPIDGTMNFVHQQRNFAISIGIFENGEGKIGLIYDWHDE 


126 


Query: 


123 


LYSGGGHFDVYANDKKIVPFQECPLERCLLGVNSAMYAEN DCGIAHLASETLGVRI 


178 






LY Y N+ K+ P +E +E +L +N+ EN +A L G R 




Sb j ct : 


127 


LYHAFSGRGAYMNETKLAPLKETVIEEAimiNATWOTENRRIDQSVIAPLVKRVRGTRS 


186 


Query: 


179 


YGGAGISMAKVMQGKLLAYFSY-IQPWDYAAAKIMGETLGFTLLTLDGEEPNYSTRQKVM 


237 






YG A + +A V G++ AY + + PWDYAA ++ +G T T++GE + V+ 




Sb j ct : 


187 


YGSAALELANVAAGRIDAYITMRIAPTOYAAGCVLLNEVGGTYTTIEGEPFTFLENHSVL 


246 



A related GBS nucleic acid sequence <SEQ ID 10937> which encodes amino acid sequence <SEQ ID 
10938> was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5823> which encodes the amino acid 
sequence <SEQ ID 5824>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1843 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 155/253 (61%) , Positives = 205/253 (80%) 

Query: 1 MDAKFDFAKQLVYKffiGQFIKSEMQNTFDVEEKSRFDDLvTSLDKKTQKLLIQEIIQHYPD 60 
5 ++ K+ FA+Q++ +AG FIKS+M D++ K++FDDLVT++D++TQ+LL+ I Q YP 

Sbjct: 8 LETKYAFARQIIKFAGLFIKSKNISEQLDIQvTCTQFDDLVTNVDQETQQLLMDRIHQTYPC 67 



Query: 61 DNIlAEEDBTOSPIAQGNVWVLDPIDGTWFIVQKDNFAVMtAYYEEGVGQFGIIYDVMA 120 
D ILAEE++VR PI QGNVWV+DPIDGTVNFIVQ FAVM+AYYE+G+GQFG+IYDVMA 
10 Sbjct: 68 DAIIAEENDWHPINQGNVWIDPIDGTVNFIVQGSQFAVMIAYYEQGIGQFGLIYDVMA 127 



15 



Query: 121 DILYSGGGHFDVYANDKKIVPFQECPLERCLLGVNSAMyAENDCGIAHLASETLGVRIYG 180 

D "L +GGG F+V N K+ +QE PLER L+G N+ M+A ND +AHL ++TLGVR+YG 
Sbjct: 128 DQLIAGGGDFEVTLNGDKLPAYQEKPLERSLIGCNAGMFARNDRNLAHLIAKTLGVRvYG 187 

Query: 181 GAGISMAK^QGKLLAYFSYIQPWDYAAAKIMGETLGFTLLTLDGEEPNYSTRQKVMFLP 240 

GAGI M KVM+ +LLAYFS+IQPWDYAAAK++G+ LG+ LLT+DG EP++ TRQK+MF+P 
Sbjct: 188 GAGICMVKVMKQELLAYFSFIQPWDYAAAKVLGDKLGYVLLTIDGYEPDFQTRQKIMFVP 247 

20 Query: 241 KSKLNLIQSYLTK 253 

K +L I S+LTK 
Sbjct: 248 KCQLTRI AS FLTK 260 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
25 vaccines or diagnostics. 

Example 1873 

A DNA sequence (GBSxl981) was identified in S.agalactiae <SEQ ID 5825> which encodes the amino 
acid sequence <SEQ ID 5826>. Analysis of this protein sequence reveals the following: 

Possible site: 16 
30 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4131 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



40 



>GP:AAC24938 ,GB:AF012285 unknown [Bacillus subtilis] 
Identities = 33/78 (42%) , Positives = 50/78 (63%) 

Query: 13 YSYPLDPSWNTEDITKVLRFLNQVEHAYENS I KVDDLLDSYKEFKKWKSKAQEKQIDRE 72 

Y YP++ W TE+ V+ F QVE AYE ++LL +Y+ FK++V KA+EK++ E 

Sbjct: 3 YQYPMNEDWTTEEAVDVIAFFQQVELAYEKGADREELLKAYRRFKEIVPGKAEEKKLCGE 62 

45 Query: 73 FQRTSGYSTYQAVKAAQQ 90 

F+ S YS Y+ VK A++ 
Sbjct: 63 FEEQSTYSPYRTVKQARE 80 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5827> which encodes the amino acid 
50 sequence <SEQ ID 5828>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

»> Seems to have no N-terminal signal sequence 

Final Results 

55 bacterial cytoplasm Certainty=0. 4442 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 
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Identities = 59/91 (64%) , Positives = 70/91 (76%) 

Query: 9 ISSOTSYPLDPSVOTEDITKOTRFLNQWHAYENSIKVDDLLDSYKEFKKWKSKAQEKQ 68 

+S NY YPLD SW+TE+I+ VL FLN+VE AYE + LLDSYK +K +VKSKAQEKQ 
Sbjct: 5 MSGNYYYPLDLSWSTEEISSVLHFLNKA7ELAYEKKVDAKQLLDSYKTYKTIVKSKAQEKQ 64 

Query: 69 IDREFQRTSGYSTYQAVKAAQQQAKGFISLG 99 

IDR+FQ+ SGYSTYQ VK A+ KGF SLG 
Sbjct: 65 IDRDFQKVSGYSTYQWKKAKAIEKGFFSLG 95 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1874 

A DNA sequence (GBSxl982) was identified in S.agalactiae <SEQ ID 5829> which encodes the amino 
15 acid sequence <SEQ ID 5830>. Analysis of this protein sequence reveals the following: 

Possible site: 18 

>>> Seems to have no N-terminal signal sequence (or aa 1-18) 

Final Results 

20 bacterial cytoplasm Certainty=0 . 0952 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

25 >GP:AAF21893 GB:AF103794 unknown [Listeria monocytogenes] 

Identities = 74/126 (58%) , Positives = 101/126 (79%) 

Query: 1 MITLFLSPSCTSCRKARAWLSKHEVAFEEHNIITSPLMKEELLQILSFTENGTEDIISTR 60 
M+TL+ SPSCTSCRK+RAWL +H++ ++E NI + PL+ +E+ +IL TE+GT++IISTR 
30 Sbjct: 1 MVTLYTSPSCTSCRKSRAWLEEHDIPYKERNIFSEPLSLDEIKEILRMTEDGTDEIISTR 60 

Query: 61 SKVFQKIAIDVDELSTSSLMELISENPSLLRRPIILDKKRMQIGFNEDEIRAFLPRDYRK 120 

SK FQKL +D+D L L ELI +NP LLRRPII+D+KR+Q+G+NEDEIR FLPR R 
Sbjct: 61 SKTFQKLNVDLDSLPLQQLFELIQKNPGLLRRPI I IDEKRLQVGYNEDEIRRFLPRRVRT 120 



35 



Query: 121 QELKQA 126 
+L++A 

Sbjct: 121 YQLREA 126 

40 A related DNA sequence was identified in S.pyogenes <SEQ ID 583 1> which encodes the amino acid 
sequence <SEQ ID 5832>. Analysis of this protein sequence reveals the following: 

Possible site: 49 ' 

>>> Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0 . 0511 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 An alignment of the GAS and GBS proteins is shown below. 

Identities = 112/134 (83%) , Positives = 127/134 (94%) 

Query: 1 MITLFLSPSCTSCRKARAWLSKHEVAFEEHNIITSPLNKEELLQILSFTENGTEDIISTR 60 . 

M+TLFLSPSCTSCRKARAWL KHEV F+EHNIITSPL+++EL+ ILSFTENGTEDI ISTR 
55 Sbjct: 1 MVTLFLSPSCTSCRKARAWLVKHEVDFQEHNIITSPLSRDELMSILSFTENGTEDIISTR 60 

Query: 61 SKVFQKIAIDVDELSTSSLMELISENPSLLRRPIILDKKRMQIGFNEDEIRAFLPRDYRK 120 

SKVFQKL IDV+ELS S L++LI++NPSLLRRPII+D+KRMQIGFNEDEIRAFL RDYRK 
Sbjct: 61 SKVFQKLDIDVEELSISDLIDLIAKNPSLLRRPIIMDQKRMQIGFNEDEIRAFLSRDYRK 120 
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Query: 121 QELKQATIRAEIEG 134 

QEL+QATI+AEIEG 
Sbjct: 121 QELRQATIKAEIEG 134 

SEQ ID 5830 (GBS232) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 51 (lane 10; MW 16.8kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 56 (lane 2; MW 42kDa). 

GBS232-GST was purified as shown in Figure 207, lane 7. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1875 

A DNA sequence (GBSxl983) was identified in S.agalactiae <SEQ ID 5833> which encodes the amino 
acid sequence <SEQ ID 5834>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

»> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5835> which encodes the amino acid 
sequence <SEQ ID 5836>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1768 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) <: suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 210/308 (68%) , Positives = 252/308 (81%) 



Query: 


1 


MKIHYINDYKDIQAKEDCTLVLGYFDGLHLGHKALFDKAKKIATEKNLKIVVLTFNETPR 


60 






M+I YI DY+DI ++D VL+LGYFDGLH GHKALFDKA+++A ++ LK+W TF E+P+ 




Sbjct: 


1 


MEIEYIKDYRDINQEDDTVLILGYFDGLHRGHKALFDKAREVANKEGLKWVFTFTESPK 


60 


Query: 


61 


LTFARFQPELLLHLTSPEKRSEKFQEYGVDELYLMNFTSHFSKVSSDLFIKKYIYGLRAK 


120 






L F+RF PELLLH+T P+KR EKF +YGV++LYL++FTS FSKVSSD FI YI L+AK 




Sb j ct : 


61 


LAFSRFSPELLLHITYPKKRYEKFADYGVNKLYLVDFTSKFSKVSSDHFITHYIKNLKAK 


120 


Query: 


121 


AAWGFDYKFGHNRTSGDYLARNFKGPVYIIDEISEGGEKISSTRIRQLITEGNVEKANQ 


180 






WGFDYKFGHNRT DYL RNF+G VY I+EI E KIS+T IR+LI EGNV KAN 




Sbjct: 


121 


HIWGFDYKFGHNRTDSDYLTRNFEGQVYTIEEIKEDHRKISATWIRKLIQEGNVVKANH 


180 


Query: 


181 


LLGYEFSTCGMVVHGDARGRTIGFPTANLAPINRTYLPADGVYISNvIiINGKYYRAMTSI 


240 






LLGY+ ST G WHGDARGRTIGFPTANLAPI+ TYLPADGVY++NV++ K YR+MTS+ 




Sb j ct : 


181 


LLGYDLSTRGRVVHGDARGRTIGFPTANLAPIDNTYLPADGVYVTNVIVANKIYRSMTSL 


240 


Query: 


241 


GKNITFGGTELRLEANIFDFDGDIYGETIEIFWLKRIRE^W ^ KFNGIDDLVKQLKKDKEIA 


300 






GKN+TFGG ELRLE NIFDFD +IYGE IEI WL +IR+M KF GI+DL +L+ DK A 




Sb j ct : 


241 


GKNVTFGGKELRLE VNI FDFDEEIYGEI IEIVWLDKIRDMEKFEGIEDLTDRLEYDKRTA 


300 


Query: 


301 


LNWKKDSQ 308 
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LNWKKDS+ 
Sbjct: 301 LNWKKDSK 308 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
5 vaccines or diagnostics. 

Example 1876 

A DNA sequence (GBSxl984) was identified in S.agalactiae <SEQ ID 5837> which encodes the amino 
acid sequence <SEQ ID 5838>. This protein is predicted to be tRNA pseudouridine 5S synthase (truB). 
, Analysis of this protein sequence reveals the following: 

10 Possible site: 56 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2576 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9817> which encodes amino acid sequence <SEQ ID 9818> 
was also identified. 

20 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06129 GB:AP001515 tRNA pseudouridine 5S synthase [Bacillus halodurans] 
' Identities = 145/283 (51%) , Positives = 191/283 (67%) , Gaps = 12/283 (4%) 

Query: 2 ITGIINLKKEAGMTSHDAVFKLRKILHTKKIGHGGTLDPDWGVLPIAVGKATRVIEYMT 61 
25 +TGI+ L K GMTSHD V KLR++L TKK+GH GTLDPDV GVLP+ +G AT+V +YM+ 

Sbjct: 3 MTGILPLAKPRGMTSHDCVAKLRRLLKTKKVGHTGTIJJPDvyGvLPVCIGHATKVAQyMS 62 

Query: 62 ESGKIYEGEITLGYATSTEDSSGEVISRTPLTQSDLSEDWDHAMKSFTGPITQVPPMYS 121 
+ K YEGE+T+G++T+TED SG+ + T Q E WD + +F G I Q+PPMYS 
30 ' Sbjct: 63 DYPKAYEGEVTVGFSTTTEDRSGDTVE-TKTIQQPFVEAVVDQVLATFVGEIKQIPPMYS 121 

Query: 122 AVKVNGKKLYEYARSGEEVERPKRQITISEFRRTSPLYFEKGICRFSFYVSCSKGTYVRT 181 

AVKV GK+LYEYAR+G VERP+R +TI R S + +E+G+CRF F VSCSKGTYVRT 
Sbjct: 122 AVKVRGKELYEYARAGITVERPERTVTIFSLERMSDIVYEEGVCRFRFNVSCSKGTYVRT 181 

35 

Query: 182 LAVDLGIKLGYASHMSFLKRTSSAGLSITQSLTLEEINEKYKQ-EDFSFLLPIEYGVLDL 240 

LAVD+G LGY +HMS L RT S S+ + T E+ E+ +Q E S LLPIE +LD+ 
Sbjct: 182 jQAVDIGKALGYPAHMSDLVRTKSGPFSLEECFTFTELEERLEQGEGSSLLLPIETAILDI 241 

40 Query: 241 PKVNLTEEDKVE I S YGR RILLENEADTLAAFYE 273 

P+V + +E + +1 +G R + NE L A Y+ 

Sbjct: 242 PRVQVNKEIEEKIRHGAVLPQKWFNHPRFTVYNEEGALLAIYK 284 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5839> which encodes the amino acid 
45 sequence <SEQ ID 5840>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

>>> Seems to have no N-terminal signal sequence 

Final Results 

50 bacterial cytoplasm Certainty=0. 2698 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



55 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 201/295 (68%), Positives = 246/295 (83%), Gaps = 2/295 (0%) 
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Query: 1 MITGIINLKKEAGMTSHDAVFKLRKILHTKKIGHGGTLDPDWGVLPIAVGKATRVIEYM 60 

MI GIINLKKEAGMTSHDAVFKLRK+L KKIGHGGTLDPDWGVLPIAVGKATRVIEYM 
Sbjct: 1 MINGIINLKKEAGMTSHDAVFKLRKLLQEKKIGHGGTLDPDWGVLPIAVGKATRVIEYM 60 

5 Query: 61 TESGKIYEGEITLGYATSTEDSSGEVISRTPLTQSDLSEDWDHAMKSFTGPITQVPPMY 120 

TE+GK+YEG++TLGY+T+TED+SGEV++R+ L + L+E++VD M +F G ITQ PPMY 
Sbjct: 61 TEAGKVYEGQVTLGYSTTTEDASGEWARSSL-PAVLTEELVDQTMTTFLGKITQTPPMY 119 

Query: 121 SAVKVNGKKLYEYARSGEEVERPKRQITISEFRRTSPLYF-EKGICRFSFYVSCSKGTYV 179 
10 SAVKVNG+KLYEYAR+GE VERP+R++TIS F RTSPL F E G+CRFSF V+CSKGTYV 

Sbjct: 120 SAVKVNGRKLYEYARAGESVERPRREVTISLFERTSPLNFTEDGLCRFSFKVACSKGTYV 179 

Query: 180 RTLAVDLGIKLGYASHMSFLKRTSSAGLSITQSLTLEEINEKYKQEDFSFLLPIEYGVLD 239 
RTLAVDLG LG SHMSFL+R++SAGL++ + TL EI + +++ SFLLPIEYGV D 
15 Sbjct: 180 RTLAVDLGRALGVESHMSFLQRSASAGLTLETAYTLGEIADMVSKQEMSFLLPIEYGVAD 239 

Query: 240 LPKVNLTEEDKVE I SYGRRI LLENEADTIiAAF YENRVIAI LEKRGNEFKPHKVLL 294 

LPK+ + + + EIS+GRR+ L ++ IAAF+ +VIAILEKR E+KP KVL+ 
Sbjct: 240 LPKMVIDDTELTEISFGRRLSLPSQEPLLAAFHGEKVIAILEKRDQEYKPKKVLI 294 

20 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1877 

A DNA sequence (GBSxl985) was identified in S.agalactiae <SEQ ID 5841> which encodes the amino 
25 acid sequence <SEQ ID 5842>. Analysis of this protein sequence reveals the following: 
Possible site: 50 

>>> Seems to have no N-terminal signal sequence 

Final Results 

30 bacterial cytoplasm Certainty=0. 2776 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9819> which encodes amino acid sequence <SEQ ID 9820> 
35 was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12871 GB:Z99109 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 39/145 (26%) , Positives = 68/145 (46%) , Gaps = 7/145 (4%) 

40 Query: 3 MKIRTATLDDSEKLVPLYQELG YAISLSEIQSILKVILTHSDYGFLIAEDNGKLLA 58 

M IR A D+ + PL+ + A L ++ LK h + + LIAE+NG+ + 

Sbjct: 1 IWIRQAKTSDAAAIAPLFNQYREFYRQASDLQGAEAFLKARLENHESVILIAEENGEFIG 60 

Query: 59 FVGYHKLYFFEKSGTYYRIIiALVVNEKHRRKGIASQLINHWQLAKTDGSEVLALNSSLK 118 
45 p ++ Y+LVRKG +L++ K A +G++ h L + + 

Sbjct: 61 FTQLYPTFSSVSMKRIYILNDLFWPHARTKGAGGRLLSAAKDYAGQNGAKCLTLQT--E 118 

Query: 119 EYRQEAYHFYENLGFKKVSTGFSYY 143 
+ ++A YE G+++ TGF +Y 
50 Sbjct: 119 HHNRKARSLYEQNGYEE - DTGFVHY 142 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5843> which encodes the amino acid 
sequence <SEQ ID 5844>. Analysis of this protein sequence reveals the following: 

Possible site: 49 
55 >>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 0962 (Affirmative) < suco 
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bacterial membrane Certainty=0. 0000 {Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear.) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 37/126 (29%) , Positives = 64/126 (50%) , Gaps = 16/126 (12%) 

Query: 18 PLYQE LGYAISLSEIQSILKVILTHSDYGFLIA--EDNGKLLAFVG YHKLYF 67 

P+ QE LGY +SL ++ + ++ + FL +D +LL +V Y LY 

Sbjct: 11 PMLQEINAKALGYLVSLDLLERQYERLIEDCIHHYFLAYADKDTNQLLGYVHAERYETLY- 69 

Query: 68 FEKSGTYYRILALvVNEKHRRKGIASQLINHvKQIAKTDGSEVLMiNSSLKEYRQEAYHF 127 

+ +L L V ++R+GI S L+ ++ A+ +G + LNS+ +R+EA+ F 

Sbjct: 70 ASDGIiNLLGLAVLPAYQRRGIGSALLRALESQARQEGIAFIRliNSA- -SHRKEAHAF 124 

15 Query: 128 YENLGF 133 

Y NL + 
Sbjct: 125 YRNLDY 130 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
20 vaccines or diagnostics. 

Example 1878 

A DNA sequence (GBSxl986) was identified in S.agalactiae <SEQ ID 5845> which encodes the amino 

acid sequence <SEQ ID 5846>. Analysis of this protein sequence reveals the following: 

Possible site: 14 
25 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 . 1659 (Affirmative) < suco 
bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
30 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

RGD motif 28-30 

The protein has homology with the following sequences in the GENPEPT database. 

35 >GP:AAF30776 GB:AE002133 conserved hypothetical [Ureaplasma 

urealyticum] 

Identities = 106/440 (24%) , Positives = 206/440 (46%) , Gaps = 65/440 (14%) 

Query: 13 FAINESEYHQLLEQIRGDAFDKEVSERLEKERLILGEQAKNQLQEVWE-KDKEIAKLQY 71 
40 F N+ +Y++L++Q +D ■ LEK+R L E+ KN+ + + KD + K 

Sbjct: 71 FLANDRDYNELVKQ RYD LEKQRDELKEKLKNEGNKAIAHFKDSDEYKNLI 120 

Query: 72 KVKQFLIEKDNLLKDNEYQLAEQLNQKDMMLRD LENQIDRLRLEHENSLQEA 123 

K ++ + + ++ NE +++ ++ L+ L+N I + ++ +N+ + A 

45 Sbjct: 121 KAQEKINSLNKTIESNEQSYKKEIENIELKLKSQFDEETKSLKNTIAKQEIKLDNAEKMA 180 

Query: 124 LTKVERE RDAIQNQLHIQ EKEKDLALASVKSDY 156 

+ + +D I + 1+ E +K + + ++S 

Sbjct: 181 IINFKESNEYQKIIKDKIDLDIEIEKLKFAIQAHEDNMKAAKENWESKKIVEIKELESKK 240 

50 

Query: 157 EVQLKAANEQVEFYKNFKAQQSTKAVGESLEHYAETEFNKVRHLAFPNAYFEKDNTLSSR 216 

+ ++ E +E K K+ + K VGE LE + + +F++ + P+ F K N 
Sbjct: 241 DKEIHKLTESIEQLKREKSS-NVKLVGEELEQWLKNKFDETYSFSCPDMTFTKINEAID- 298 

55 Query: 217 GSKGDFIY REKDENDLEFL-SIMFEMKNESDDTIKKHKNEDFFKELDKDRREKS 269 

G K DF+ +E +D + + S EKEDKKN +K+LD+DR + 

Sbjct: 299 GKKADFLLEFFDFGKEMSNDDKKLIFSATIEAKTEFFDNQKGTKNSAHYKKLDQDRINQK 358 

Query: 270 CEYAVIiWMLEADNDYYNTGIVDVSHKYPKMWIRPQFFIQLIGILRNAAIiNTLKYKQEL 329 
60 EYA+LVT LE ++ + ++ ++Y M+ +RPQ+FI L+ ++RN A TLK K 

Sbjct: 359 SEYAI LVTELEPEDHF VIKKINEYKNMFAVRPQYFIPLVDMIRNFA--TLKAKINS 412 
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Query: 330 ALMKEQNIDITHFEEDLDIFKNAFAKN-YNSASKNFQKAIDEIDKSIKRMEAV-KAALTT 387 

+++ + D EE+LD K N + +K ID+ IK+ E++ ++A 

Sbjct: 413 QIIRYE--DRAKIEENLDELKKDIVDOTLKYINDKTKKIIDDSKAIIKKAESIEESAEDI 470 

5 

Query: 388 SENQLRLANNKLDDVSVKKL 407 

+L K+++++++K+ 
Sbjct: 471 INKKLNTLKKKINELTIRKI 490 

10 A related DNA sequence was identified in S.pyogenes <SEQ ID 5847> which encodes the amino acid 
sequence <SEQ ID 5848>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

>» Seems to have no N-terminal signal sequence 

15 Final Results 

bacterial cytoplasm Certainty=0 . 3192 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Mot Clear) < suco 

20 An alignment of the GAS and GBS proteins is shown below. 

Identities = 310/445 (69%) , Positives = 352/445 (78%) , Gaps = 22/445 (4%) 

Query: 1 MNEIKCPHCGTAFAINESEYHQLLEQIRGDAFDKEVSERLEKERLILGEQAKNQLQEVW 60 
MNEIKCPHC T F INESEY QLLEQ+RG AFD+E+ +RL E +L E+AK+QL EW 
25 Sbjct: 1 MNEIKCPHCHTLFTINESEYSQLLEQVRGQAFDEELKKRLINEIALLEEKAKHQLHEWA 60 

Query: 61 EKDKEIAKLQYKVKQF LIEKDNLL- KDNEYQLAEQLNQK 98 

+K+ I h +++Q L +KD L+ N +LA QL +K 

Sbjct: 61 KKETAITSLTNQLEQIEKEQAYLRQEELAKKDQLIASLEAKLDKLASQNALELANQ 120 

30 

Query: 99 DMMLRDLENQIDRLRLEHENSLQEALTKVERERDAIQNQLHIQEKEKDLALASVKSDYEV 158 

D + L NQ+D+L LE + + Q L +E+ERD I+NQL +Q KE +L+LASV+SDYE 
Sbjct: 121 DKEWSLTNQLDKLALEKDATFQSKLATIEKERIXSIKNQLALQAKESELSL^VRSDYEA 180 

35 Query: 159 QLKAANEQVEFYKNFKAQQSTKAVGESLEHYAETEFNKVRHLAFPNAYFEKDNTLSSRGS 218 

QLKAANEQVEFYKNFKAQQSTKA+GESLE YAETEFNKVR AFPNA F KDN LSSRGS 
Sbjct: 181 QLKAANEQVEFYKNFKAQQSTKAIGESLELYAETEFNKVRSYAFPNASFVKDNQLSSRGS 240 

Query: 219 KGDFIYREKDFJTOLEFLSIMFEMKNESDDTIKKHKNEDFFKELDKDRREKSCEYAVLVTM 278 
40 KGD+IYRE D N +E LSIMFEMKNE+D T KHKN DFFKELDKDRREK CEYAVLV+M 

Sbjct: 241 KGDYIYREVDANGVEILSIMFEMKNEADTTKTKHKNSDFFKELDKDRREKDCEYAVLVSM 300 

Query: 279 IiEADNDYYNTGIVDVSHKYPKMWIRPQFFIQLIGILRNAaiMTLKYKQELALMKEQNID 338 
LEADNDYYNTGIVDVSH+Y KMYV+RPQ FIQLIGILRNAALN+L YKQELAL+KEQNID 
45 Sbjct: 301 LEADNDYYNTGIVDVSHEYQmmTRPQLFIQLIGILRNAAmSLHYKQEIALVKEQNID 360 

Query: 339 ITHFEEDLDIFKNAFAK^^YNSASKNFQKAIDEIDKSIKRMEAVKAALTTSENQLR^lANNK 398 

ITHFEEDLD FKNAFAKNY SAS NF+KAIDEIDKSIKRME VK LTTSENQLRLAMNK 
Sbjct: 361 ITHFEEDLDQFKNAFAK1JYQSASNNFK2CAIDEIDKSIKRMEEVKRFLTTSENQLRIjANNK 420 

50 

Query: 399 LDDVSVKKLTRKNPTMKAKFDALKD 423 

L+DVSVKKLTR+NPTM+ KF+ALKD 
Sbjct: 421 LEDVSVKKLTRQNPTMREKFEAIjKD 445 

55 SEQ ID 5846 (GBS304) was expressed in E.coli as a His-fusion product. The purified protein is shown in 
Figure 206, lane 7. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1879 

A DNA sequence (GBSxl987) was identified in S.agalactiae <SEQ ID 5849> which encodes the amino 
acid sequence <SEQ ID 5850>. This protein is predicted to be unnamed protein product. Analysis of this 
protein sequence reveals the following: 

5 Possible site: 34 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1845 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 585 1> which encodes the amino acid 
15 sequence <SEQ ID 5852>. Analysis of this protein sequence reveals the following: 
Possible site: 51 

»> Seems to have no N-terminal signal sequence 

Final Results 

20 bacterial cytoplasm Certainty=0. 2492 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 
25 Identities = 113/180 (62%) , Positives = 141/180 (77%) 



30 



Query: 16 LSELVDCFKGKRVPSKAEAGDIRIINLSDMSPLGIDYHNLRTFQDEQRSLLKYLLQEGDV 75 

L +VDCFKGKAV SK GD+ +INLSDM LGI YH LRTFQ ++R LL+YLL++GDV 
Sbjct: 18 LGTWDCFKGKAVSSKWPGDVGLINLSDMGTLGIQYHQLRTFQMDRRQLLRYLLEDGDV 77 

Query: 76 LIASKGTVKKVAIFEEQDYPWASANITILRPTQHIRGYYLKLFFDSEEGQQALENANKG 135 

LIASKGT+KKV +F +Q+ WAS+NIT+LRP + +RGYY+K F DS GQ L+ A+ G 
Sbjct: 78 LIASKGTLKKVCVFHKQNRDWASSNITVLRPQKLLRGYYIKFFLDSPIGQALLDVADHG 137 

35 Query: 136 KAVMNISTKELLNIAIPSIPLFRQDYLIQRYKQGIiNDYKRKIARAEQEWERIQNDIRQQL 195 

K V+N+STKELL+I IP IPL +QDYLI Y +GL DY RK+ RAEQEWE IQN+I++ L 
Sbjct: 138 KDVINLSTKELLDIPIPVIPLVKQDYLINHYLRGLTDYHRKLNRAEQEWEYIQNEIQKGL 197 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
40 vaccines or diagnostics. 

Example 1880 

A DNA sequence (GBSxl988) was identified in S.agalactiae <SEQ ID 5853> which encodes the amino 
acid sequence <SEQ ID 5854>. Analysis of this protein sequence reveals the following: 

Possible site: 15 
45 >>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -7.43 Transmembrane 62 - 78 ( 55 - 82) 
INTEGRAL Likelihood = -2.87 Transmembrane 130 - 146 ( 130 - 150) 
INTEGRAL Likelihood = -1.28 Transmembrane 37 - 53 ( 37 - 53) 

50' Final Results 

bacterial membrane Certainty=0. 3972 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 9347> which encodes amino acid sequence <SEQ ID 9348> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA22372 GB:AL034446 putative transmembrane protein 
5 [Streptomyces coelicolor A3 (2) ] 

Identities = 38/139 (27%) , Positives = 64/139 (45%) , Gaps = 5/139 (3%) 

Query: 15 SASVEILCRGWLLPVSATKYSKIVSVSISSIFFGLLHSANNHVSLISIFNLCL-FGLFLS 73 
+A+ E++ RG L + +++ ++ + FGL+H N +L + + G L+ 

10 Sbjct: 143 AATEEWFRGVIiFRIIEEHIGTYIjALGLTGLvTOLMLIjNEDATLWGALAIAIEAGFMIA 202 



15 



Query: 74 LYVILKGNIWGACGIHGAWNCVQGSVFGIEVSGEPMLSNSLVHVKTYGADWISGGKFGVE 133 

N+W G+H WN G VF VSG S L+ G ++GG FG E 

Sbjct: 203 AAYAATRNLWLTIGVHFGWNFAAGGVFSTWSGNGD-SEGLLDATMSGPKLLTGGDFGPE 261 

Query: 134 GSMIT SI VLIVACYWL 149 

GS+ + ++L + WL 
Sbjct: 262 GSVYSVGFGVLLTLVFLWL 280 

20 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1881 

A DNA sequence (GBSxl989) was identified in S.agalactiae <SEQ ID 5855> which encodes the amino 
25 acid sequence <SEQ ID 5856>, which is a methylase gene homolog. Analysis of this protein sequence 
reveals the following: 

Possible site: 33 

>» Seems to have no N-terminal signal sequence 

30 Final Results 

bacterial cytoplasm Certainty=0 . 2192 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 RGD motif: 264-266 

A related GBS nucleic acid sequence <SEQ ID 9929> which encodes amino acid sequence <SEQ ID 9930> 
was also identified. 



40 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA87672 GB:AB016260 Hypothetical gene, methylase gene homolog 
[Agrobacterium tumefaciens] 
Identities = 358/1238 (28%) , Positives = 595/1238 (47%) , Gaps = 99/1238 (7%) 

45 Query: 1072 KEVARIKGMTOIRNAYQEVIAIQRYYDYDKETFNHLLGKLNRTYDSFVKHYGYIiNSAV-- 1129 
K V 1+ ++ IR+A +EV+ Q + Ii +Ii + SFV+ +G +N 

Sbjct: 497 KHVRIIRKLIPIRDAVREVLKAQEL DRPWKDLQVRLRVAWSSFVRDFGPINHTTVS 552 

Query: 1130 NRNLFDSDDKYSLLASLEDESL--DPSGKSVIYTKSLAFEKAL 1170 

50 N F D L+AS+ED L D + I+T E+ + 

Sbjct: 553 ITEDPESGETRESHRRPNLQPFADDPDCWLVASIEDYDLENDTAKPGAIFT ERVI 607 

Query: 1171 WPEKEVKKVHTALDALNSSLADGRGVDFAYMMSIYQVESQMTLIEELGDLIMPDPEKYL 1230 
P V + +A DAL L + VD ++ + + ++ ELG I DP 
55 Sbjct: 608 SPPAPPV- -ITSAADAIAVVLNERGRvDLDHIAELLHRDPD-DVVAEIiGSAIFRDP 660 
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Query: 1231 NGELTYVSRQDFLSGDWTKLE WDLFVKQDNQDFNWSHYAGLLEAI KPARI TLAD I DYR 1290 

+ ++ +LSG V KL+V + D ++ L ++P + +DI R 

Sbjct: 661 - ADGSWQMADAYLSGPVRDKLKVAEAAAALD PV YNRNVTALAGVQPVDLRPSD I TAR 716 

5 

Query: 1291 IGSRWIPLAVYGKFAQETFMGKAYELSDQ-EVATVLEVSPIDGVITYQSKFAYTYSNATD 1349 

+G+ WIP A F +E MG + E+A+ + G + AT TD 

Sbjct: 717 LGAPWI PAAD WAFVKE - MMGTD I R I HHMPELASWTVEARQLGYLA AGTSEWGTD 770 

10 Query: 1350 RSLGVPASRYDSGRKIFENLLNSNQPTITKQVVEGDKKKNVTDVEKTTVLRAKETHLQEL 1409 
R ++ + LNS PI + +GD ++ V +V T + K +++ 

Sbjct: 771 RR HAGELLSDALNSRVPQIFDTIRDGDSERRVIJSIVVDTEAAKEKLHKIKDA 821 

Query: 1410 FQGFVAKYPEVQQMIEDTYNRLYNRTVSKSYDGSHLTIDGLAQNISLRPHQKNAIQRIVE 1469 
15 FQ ++ P+ + YN +N + + G HL + G + L HQK I RI+ 

Sbjct: 822 FQRWIWSDPDRTDRIARVYM3RFNNI APRKFSGDHLNLPGASGAFVLYGHQKRGIWRI I S 881 

Query: 1470 EKRALIAHEVGSGKTLTMLGAGFKLKEKSMVHKPLYWPSSLTAQFGQEIMKFFPTKKVY 1529 
LAH VG+GKT+TM + + + LG++ K + WP AQ +E + +PT ++ 
20 Sbjct: 882 SGSTYLAHAVGAGKTMTMAASIMEQRRLGLIAKAMQWPGHCIAQAAREFLALYPTARIL 941 

Query: 1530 VTTKKDFAKAKRKQFVSRI ITGDYDAI VIGDSQFEKI PMSREKQVTYINDKLEQLREI KL 1589 

V + +F+K KR +F+SR T +DAI+I S F I + + I+D+LE + L 
Sbjct: 942 VADETNFSKDKRARFLSRAATATWDAIIITHSAFRFIGVPAAFESQMIHDELELYETLLL 1001 

25 

Query: 1590 GSDSDYTV- - KEAERS 1 KGLEHQLEELQKLERDTF IEFENLGIDFLFVDEAHHFKNIRP I 1647 

+ + V K ER +GL+ +LE L +D + +G+D + VDEA F+ + 
Sbjct: 1002 KVEDEDRVSRKRLERLKEGLQERLEALST-RKDDLLTIAEIGVDQIIVDEAQEFRKLSFA 1060 

30 Query: 1648 TGLGWAGITJSrTTSKKNVDMEMKVRQVQAEHGDRNVVFATGTPVSNSISELFTMMDYIQP 1707 

T + + G+ S++ D+ +K R ++ + R +V A+GTP++N++ E+F++ + 
Sbjct: 1061 TNMSTLKGVDPNGSQRAWDLYVKSRFIETINPGRALVLASGTPITNTLGEMFSVQRLMGH 1120 

Query: 1708 DVLERYLVSNFDSWVGAFGNIENSMELAPTGDKYQPKKRFKKFVNLPELMRIYKETADI- 1766 
35 LE + FD+W FG+ +EL P+G KY+P RF FVN+PEL+ +++ AD+ 

Sbjct: 1121 AALEERGLHEFDAWASTFGDTTTELELQPSG-KYKPVSRFASFVNVPELIAMFRSFADW 1179 

Query: 1767 QTSDMLDLP-VPEAKIIAVESELTQAQKYYLEELVKRSDAIKSGS- -VDPSRDNMLK 1820 

+ + + p + + v S+ TQA K++ I) +R AI+ P D +L 

40 Sbjct: 1180 MPADLREYVKVPAISTGRRQIVTSKPTQAFKHHQMVLAERIKAIEERERPPQPGDDILLS 1239 

Query: 1821 ITGEARKLAIDMRLIDPTYSLSDNQKILQVVDNVERIYRDGAGDK AT 1867 

+ + R AID+RL+D + K+ +V N RI++ AG A 

Sbjct- 1240 VITDGRHAaiDLRLVDADNDNEPDNKLNNLVSNAFRIWKATAGSVYLRHDSKPFEVPGAA 1299 

45 

Query: 1868 QMIFSDIGTPK-SKEEGFDVYNELKDLE'VDRGIPKEEIAFWDANTDEKKNSLSRKVNSG 1926 

QMIFSD+GT K GF Y ++D + G+P EIAF+ D E K L V +G 
Sbjct: 1300 QMI FSDLGTI SVEKTRGFSAYRWI RDEL I RLGVPASE I AFMQDFKKSEAKQRLFGDVRAG 1359 

50 Query: 1927 EVRILMASTEKGGTGLNVQSRMKAVHYLDVPWRPSDIVQRNGRLIRQGNMHQEVDIYHYI 1986 

VR L+ S+E GTG+NVQ R+KA+H+LDVPW PS I QR GR++RQGN H EVDI+ Y 
Sbjct: 1360 RTOFLIGSSETMGTGVNVQLRLKALHHLDVPWLPSQIEQREGRITOQGNQHDEVDIFAYA 1419 

Query: 1987 TKGSFDNYLWQTQENKLKYITQIMTSKDPVRSAEDIDE-QTMTASDFKALATGNPYLKLK 2045 
55 T+GS D +WQ E K ++I ++ +R EDI E Q + KA+A+G+ L K 

Sbjct: 1420 TEGSLDATMWQNNERKARFIAAALSGDTSIREIiEDIGEGQANQFAMAKAIASGDQRLMQK 1479 

' Query: 2046 MELENELTVLENQKRAFNRSKDEYRHTISYSEKHLPIMEKRLSQYDKDIAQSLATKSQDF 2105 
LE ++ LE + A + R + +E+ + + +R+++ +DI + + T +DF 
60 Sbjct: 1480 AGLEADIARLERLRAAHIDDQHAVRRQLRDAERDIEVSTRRIAEIGQDITRLVPTTGEDF 1539 

Query: 2106 vmFDNQAMDNRAEAGDYLRK-LITYNRSETKETOTLASFRGFDLKM-TTRGASEPLPET 2163 

M + R EAG L K ++T + + +AS GF+L+ R + T 
Sbjct: 1540 TMTVAGKDYSERKEAGRALMKEILTLVQLSPEGEAVIASIGGFELEYHGQRYGKDGYRYT 1599 



65 



Query: 2164 ISLMIVGDNQYTVALDLK-SDVGTIQRISNAIDHIIDDQEKTQELVKDLKDKLRVAKVEV 2222 

L G + Y + L+ ++G + R+ +A+D ++E+ ++ + D + +L + 
Sbjct: 1600 TMLKRTGAD-YEIELPVTVTPLGAVSRLEHALDDFDGERERYRQRLGDARRRIASYQSRG 1658 
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Query: 2223 DKVFPKEEDYQLVKAKYDVIAPLvEKEAEIEEIDAALA 2260 
+ +++ L EK ++ E++ ALA 

Sbjct: 1659 E --GSEFAFAGEIAEKHRQLAEVETALA 1684 

Identities = 99/271 (36%) , Positives = 153/271 (55%) , Gaps = 10/271 (3%) 



Query: 


607 


RDKVETNI VAI RLVKNLEVEHRNAS PSEQELL&KYVGWGG - - IANEFFD DYNPKF 


659 






+D+ NI AIRL +E R A+ EQE L ++ G+G LAN F ++ + 




Sbj ct : 


80 


KDRARDNIAAIRLAAEIEASERPATREEQETLIRFTGFGASDLANGVFRRPGELEFRKGW 


139 


Query: 


660 


SKEREELKSLVTDKEYSDMKQSSLTAYYTDPSLIRQMWDKLERDGFTGGKILDPSMGTGN 


719 






+ +L+ V + +Y+ + + + A++T ++R +W L+R G+ GG++L+P +GTG 




Sbj ct : 


140 


DEIGSDLEDAVGETDYASIARCTQYAHFTPEFIVRAIWSGL^ 


199 


Query: 


720 


FFAAMPKHLREKSELYGVELDTITGAIAKHLHPNSHIEIKGFETVAFNDNSFDLVISNVP 


779 






F A MP+ LR+ S + GVELD +T I+LP + I F SFDL I N P 




Sbjct: 


200 


FPALMPEALRDLSHVTGVELpPVTAC I VRLLQPRARI LTGDFARTEL - PAS FDLAIGNPP 


258 


Query: 


780 


FANIRIADNRYDRP--YMIHDYFVKKSLDLLHDGGQVAIISSTGTMDKRTENILQDIRET 


837 






F++ + +R R +HDYFV +S+DLL G A ++S+GTMDK Q I T 




Sbj ct : 


259 


FSDRTVRSDRAYRSLGLRLHDYFVARSIDLLKPGAFAAFVTSSGTMDKADSAARQHIATT 


318 


Query: 


838 


TEFLGGVRLPDSAFKAIAGTSVTTDMLFFQK 868 








+ + +RLP+ +F+A AGT V D+LFF+K 




Sbj ct : 


319 


ADLIAAIRLPEGSFRADAGTDWVDILFFRK 349 





SEQ ID 5856 (GBS327N) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total 
cell extract is shown in Figure 148 (lane 8-10; MW 140kDa). It was also expressed in E.coli as a His-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 148 (lane 11-13; MW 115kDa) and in 
Figure 182 (lane 8; MW 115kDa). 

Purified GBS327N-GST is shown in Figure 243, lane 5; Purified GBS327N-His is shown in Figure 235, 
lane 5. 

GBS327C was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is 
shown in Figure 148 (lane 14; MW 73kDa). 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1882 

A DNA sequence (GBSxl990) was identified in S.agalactiae <SEQ ID 5857> which encodes the amino 
acid sequence <SEQ ID 5858>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3656 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1883 

A repeated DNA sequence (GBSxl991) was identified in S.agalactiae <SEQ ID 5859> which encodes the 
amino acid sequence <SEQ ID 5860>. This protein is predicted to be giant membrane protein. Analysis of 
this protein sequence reveals the following: 

Possible site: 33 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .3698 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG19662 GB:AE005054 calcium-binding protein homology; Cbp 
[Halobacterium sp. NRC-1] 
Identities = 22/43 (51%) , Positives = 29/43 (67%) , Gaps = 1/43 (2%) 

Query: 9 KDSDQDGLTDAQELAL - GTDPQS VDTDGDGQADLEELQSGHS P 50 

+D+D DGL+D E+ + GTDP DTDGDG D EL++G P 
Sbjct: 198 RDTDDDGLSDGVEVRVAGTDPTERDTDGDGVDDAAELRAGSLP 240 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1884 

A DNA sequence (GBSxl992) was identified in S.agalactiae <SEQ ID 5861> which encodes the amino 
acid sequence <SEQ ID 5862>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.39 Transmembrane 1609 -1625 (1609 -1625) 
INTEGRAL Likelihood = -1.81 Transmembrane 30 - 46 ( 29 - 46) 



Final Results 

bacterial membrane Certainty=0 . 1956 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

LPXTG motif 1600-1604 

The protein has homology with the following sequences in the GENPEPT database. 

!GB:X57841 antigen I /II [Streptococcus sobrinus] (v. . . 

>GP:CAA40973 GB:X57841 antigen I /II [Streptococcus sobrinus] 
Identities = 419/1436 (29%) , Positives = 608/1436 (42%) , Gaps = 310/1436 (21%) 

Query: 23 KSKKYRTLCSVALGTMVTAWAWGGTVAHADEVTTSV DTTIQRTE- -NPATNLPEA 76 

K K RTL LGT + A A G A A+E +T+ DT + TE NPATNLP+ 

Sbjct: 23 KVICSGRTLSGALLGTAIIjASGA--GQKAIjAEETSTTSTSGGDTAVVGTETGNPATNLPDK 80 

Query: 77 QPNP VSEQTESMASTGQSNGAIAVTVPHDTVT CAVE 112 

Q NP V T + +S VTV D + + 

Sbjct: 81 QDNPSSQAETSQAQARQKTGAMSVDVSTSELDEAAKSPQEAGVTVSQDATVNKGTVEPSD 140 

Query: 113 EAKAEGVSTVEDS PMDLGNTRSAVET NQQIS K 144 

EA + +D + + A E NQ+I+ K 

Sbjct: 141 EANQKEPEIKDDYSKQAADIQKATEDYKASVAANQAETDRINQEIAAKKAQYEQDLAANK 200 
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Query: 145 AD ADTQKQVETINEVTK TYKADKATYESNKARIEQEN 181 

A+ A QK + I + Y A K Y+ AR++ N 

Sbjct: 201 AEVERSLMRMRKPRPIYEAKLAQNQKDLAAIQQaNSDSQAAYAAAKEAYDKEWARVQAAN 260 

Query: 182 KELSQAYEGANQTGKETNAOTDTKVNDLKARYADADVTVKEQ WSSGNGTSVL 234 

+AYE A N + ++++RAAD K +GN + 

Sbjct: 261 AAAKKAYEEALAANTAKNDQI KAEIEAI QQRSAKADYEAKLAQYEKDIAAAQAGNAANEA 320 

Query: 235 DY TNYGKAVETIQSTNEQAVADY LTKKTKADDIVAKNQAIQKENEA 280 

DY Y + + +Q+ NAY K I A+N+AIQ+ +A 

Sbjct: 321 DYQAKKAAYEQELARVQAANAAAKQAYEQAIAANSAKNAQITAENEAIQQNAQAKADYEA 380 

Query: 281 GLANAKADNEAI ERRNQAGQAAVDAEN RAGQAAVDQANQEKQQLVSDRAA 330 

LA A++ NAE Q AA + E +AAAQA +++ Q + + A 
Sbjct: 381 KIAQYQKDIAAAQSGNAANEADYQEKLAAYEKELARVQAANAAAKQAYEQQVQQANAKNA 440 

Query: 331 EIEAITKRNKEKEAAARKENEAIDAYNTKEMERYQRDLAEIS 372 

EI + +E+ A A+ + E + +E+ +Y++DLAE 
Sbjct: 441 EITEANRAIRERNAKAKTDYELKLSKYQEEIiAQYKKDLAEYPAKLQAYQDEQAAIKAALA 500 

Query: 373 KGEEGYISEALAQALNLNNGEPQAQHGAITRN--- 404 

K E+G +SE AQ+L + + EP AQ +T 
Sbjct: 501 ELEKHKlffiDGNLSEPSAQSL-VYDLEPNAQVALVTDGKLLKASALDEAFSHDEKNYNNHL 559 

Query: 405 --PDQI ISTGDALLGGYSRILDSTGF FVYDMFKTGETLS 441 

PD + +++ LG+ DG+ F+K G++ + 

Sbjct: 560 LQPDNLNVTYLEQADDVASSVELFGNFG DKAGWTTTVSNGAEVKFASVLLKRGQSAT 616 

Query: 442 FNYQNLQHARFDGKKI SRVTYDITNLVSPAG TNAVKLWPNDPTEGFIAYRNDGN 496 

Y NL+++ ++GKKIS+V YTVP T V L + DPT G A G 

Sbjct: 617 ATYTNLKNSYYNGKKISKWYKYT--VDPDSKFQNPTGNVWLGIFTDPTLGVFASAYTGQ 674 

Query: 497 GDWRTD---KMEFRWAKYYIjEDGSQVTFSKEKPGVFTHSSIJvIHNDIGLEYVKDSSGKFV 553 

+ T K EF +Y EDG+ + F + + +SLN +E KD SG FV 

Sbjct: 675 NEKDTSIFIKNEF TFYDEDGNPI DFDN ALLSVASLNREHNS I EMAKDYSGTFV 727 

Query: 554 PINGSTVQVTN EGLARSLGSNRASDLNLPEEWDTTS SRYAYKGAI V 599 

I+GS++ N EG + RAS+ WD+ + ++ GA 

Sbjct: 728 KISGSSIGEKNGMIYATDTLNFKKGEGGSLHTMYTRASEPG- - SGWDSADAPNSWYGAGA 785 

Query: 600 STVTSGNTY TVTFGQGDMPQNVGL SYWFALN 630 

++ NY T +MPQ G + W++LN 

Sbjct: 786 ATRMSGPNNYITLGATSATNVLSLAEMPQVPGKDNTAGKKPNIWYSmGKIRAVNVPKVTK 845 

Query: 631 - -TLPVARTVTPYSPKPHVTVEL EPIPEPITVTPDIYTPKTFTPEKPVTFT 679 

P P P V EL EP EP TP P PEKPV T 

Sbjct: 846 EKPTPPVEPTKPDEPTYEVEKELVDLPVEPKYEP-EPTPPSKNPDQSIPEKPVEPTYEVE 904 

Query: 680 PKPLDEWQPSLTLTKVT LPVKPIPKELPTPP QVPTV 716 

P P++ + T + T PV+P + LPTPP VPTV 

Sbjct: 905 KELEPAPVEPSYEKEPTPPQSTPDQEEPEKPVEPSYQSLPTPPVEPVYETVPGPVSVPTV 964 

Query: 717 HYHAYRLTTTSEIMKEVWSDQANLHEKTVAKDSTVIYPLTVDALSPNRAQTTSLIFEDY 776 

YH Y+L + KE+ N D ++ + VAK STV + L L R +TTS + D 

Sbjct: 965 RYHYYKIAVQPGVTKEIKNQDDLDIDKTLVRKQSTVKFQLKTADLPAGRPETTSFVLMDP 1024 

Query: 777 LPAGYLFDKETTQKENGNYVLSFDETKNFVTLTAKENLLQEVNKDLTQVYQLTAPKLYGS 836 

LP+GY + E T+ + + S+D + VT TA L +N+DLT+ P + G 

Sbjct: 1025 LPSGYQIjNLFATKVASPGFEASYDAMTHTVTFTATAETIiAMjNQDLTKAVATIYPTWGQ 1084 

Query: 837 VQNDGATYSNSYKLLLNKGTTNAYTVTSNWTVRTPG DGETTTLITPDKNNENAD 891 

V NDGATY+N++ L++N +AY + SN+V V TPG D + ITP K N+N + 

Sbjct: 1085 VLNDGATYTNNFTLMVN DAYGIKSNIVRVTTPGKPNDPDNPSNNYITPHKVNKNEN 1140 

Query: 892 GVLINDTWALGTTNHYRLTWDLDQYKGDRSAKETIARGFFFVDDYPEEVLDVVENGTAI 951 

GV+I+ V GTTN+Y LTWDLDQYKGD+SAKE I +GFF+VDDYPEE LD+ + + 
Sbjct: 1141 GWIDGKSVLRGTTNYYELTWDLDQYKGDKSAKEIIQKGFFYVDDYPEEALDLRTDLIKL 1200 
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Query: 952 TTLDGQKVSGITVKNYASLI^PKDLQDKLARAKrTPTGAFQVFMPDDNQAFyDQYVQTG 1011 

T +G+ V+G++V +YASL AP +QD L +A I P GAFQVF DD- QAFYD YV TG 
Sbjct: 1201 TDANGKAVTGVSVADYASLEAAPAAVQDMLKKANIIPKGAFQVFTADDPQAFYDAYWTG 1260 

Query: 1012 TSLALLTKMTVKDSLYGQTKTYTNKAYQVDFGNGYETKEVTOTLVSPEPKKQ-N]^KD^ 1070 

T L ++T MTVK + +Y N+AYQ+DFGNGYE+ V N + P+K L D 

Sbjct: 1261 TDLTIVTPMWKAEMGKTGGSYENRAYQIDFGNGYESNLVVNNVPKINPEKDVTLTMDPA 1320 

Query: 1071 D, INGKPMLVGTQNHYTLSVJDLDQYRGI KADNSQIAQGFYFVDDYPE EALLPD 1122 

D ++G+ + + +Y L + I AD+++ + F DDY + 

Sbjct: 1321 DSTNVDGQTIALNQVFNYRLIGGI IPADHAEELFEYSFSDDYDQTGDQYTGQYKA 1375 

Query: 1123 EAAIQFVTSDGKTV-SGITVKSY--SQLLEAPKTLQAAFSKQKIQPKGAFQVFMPE 1175 

A + DG + +G + SY +Q+ EA + F + ++ F E 

Sbjct: 1376 FAKVDLTLKDGTIIKAGTDLTSYTEAQVDEANGQIWTFKEDFLRSVSVDSAFQAE 1431 
Identities = 209/442 (47%) , Positives = 280/442 (63%) , Gaps = 27/442 (6%) 

Query: 1198 TVLETMLNSGKS Y - ENVAYQ VDFGQAYETNTVTNFVPK VTPHKSNTNQ 1244 

TV+ +I1N G +Y N V+ ++N V P +TPHK N N+ 

Sbjct: 1080 TWGQVMtDGATYTlMFTLMVffiDAYGIKSNI^ 1139 

Query: 1245 EGI S IDGKTVLPNTVNYYKIVLDYSQYKDMVVTDDVIAKGFYMVDDYPEEALTLNPDGIQ 1304 

G+ IDGK+VL T NYY++ D QYK +++ KGF+ VDDYPEEAL L D 1 + 

Sbjct: 1140 NGWIDGKSVTAGTTNYYELTWDLDQYKGDKSAKEIIQKGFFYVDDYPEEALDERTDLIK 1199 

Query: 1305 VLDKDGNRVSGISVSTYASLSEAPKWQDAMAKRQFTPKGAIQVLSSDDPKVFYDTYVKT 1364 

+ D +G V+G+SV+ YASL AP VQD + K PKGA QV ++DDP+ FYD YV T 
Sbjct: 1200 LTDANGKAVTGVSVADYASLEAAPAAVQDMLKKANI I PKGAFQVFTADDPQAFYDAYWT 1259 

Query: 1365 GQTLVVTLPMTVKlWLTKTGGQYENTAYQIDFGIAYvTETVVNNVPKLDPQK^ 1424 

G L + PMTVK E+ KTGG YEN AYQIDFG Y + WNNVPK++P+KDV + + 
Sbjct: 1260 GTDLTIWPMTVKAEMGKTGGSYEM^YQIDFGNGYESNLVVNNVPKINPEKDVTLTMDP 1319 

Query: 1425 KDA-SLKSKEVALHQTFtnTRLVGAMIPSNRATDLFEYGFEDNYDEKHDEYlIGVYRSYLMT 1483 

D+ ++DG+ +AL+Q FNYRL+G +IP++ A +LFEY F D+YD+ D+Y G Y+++ 
Sbjct: 1320 ADSTiWDGQTIALNQVFNYRLIGGIIPADHAEELFEYSFSDDYDQTGDQYTGQYKAFAfCV- 1379 

Query: 1484 DVILKDGSVLKEGTEVTKYTLQQVDTENGLVSISFDKSFLETVSDDSAFQADVYLQMKRI 1543 

D+ LKDG+++K GT++T YT QVD NG + ++F + FL +VS DSAFQA+VYLQMKRI 
Sbjct: 1380 DLTLKDGTIIKAGTDLTSYTEAQVDEANGQIWTFKEDFLRSVSVDSAFQAEVYLQMKRI 1439 

Query: 1544 AAGQVENTYLHTVNGYVISSNTWTHTPQPEEPSPNQP TPPQPPIETIEPPV 1595 

A G NTY++TVNG SSNTV T TP+P++PSP P P Q PP 

Sbjct: 1440 AVGTFANTYWTWGITYSSOTTOTSTPEPKQPSPVDPKTTTTWFQPRQGKAYQPAPPA 1499 

Query: 1596 PASILPNTGEQES LLGLI 1613 

A LP TG+ + LLGL+ 
Sbjct: 1500 GAQ - LPATGDSSNAYLPLLGLV 1520 
Identities = 100/210 (47%), Positives = 137/210 (64%), Gaps = 4/210 (1%) 

Query: 1060 PKKQNI^KDKVDINGKPMLVGTQNHYTLSTOLDQYRGIKADNSQIAQGFYFVDDYPEEAL 1119 

P K N N++ V I+GK +L GT N+Y L+WDLDQY+G K+ I +GF++VDDYPEEAL 
Sbjct: 1132 PHKVNKNENGWIDGKSVIiAGTTNYYELTWDLDQYKGDKSAKEIIQKGFFYVDDYPEEAL 1191 

Query: 1120 LPDEAAIQFVTSDGKTVSGITVKSYSQLLEAPKTLQAAFSKQKIQPKGAFQVFMPEDPQA 1179 

1+ ++GK V+G++V Y+ Ii AP +Q K I PKGAFQVF +DPQA 
Sbjct: 1192 DLRTDLIKLTDANGKAVTGVSVADYASI.EAAPAAVQDMLKKANIIPKGAFQVFTADDPQA 1251 

Query: 1180 FFESYOTKGENITIOTPMTVLETMMSGKSYENVAYQ 1239 

F+++YV G ++TIVTPMTV M +G SYEN AYQ+DFG YE+N V N VPK+ P K 
Sbjct: 1252 FYDAYVVTGTDLTIOTPMTVKAEMGKTGGSYElSn^YQIDFG 1311 

Query: 1240 SNT NQEGISIDGKTVLPNTVNYYKIV 1265 

T + ++DG+T+ N V Y+++ 

Sbjct: 1312 DVTLTMDPADSTNVDGQTIALNQVFNYRLI 1341 



There is also homology to SEQ ID 598. 
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SEQ ID 5862 (GBS76) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 17 (lane 2; MW 17.4kDa). The GBS76-His fusion product was purified (Figure 
196, lane 8) and used to immunise mice. The resulting antiserum was used for FACS (Figure 294), which 
confirmed that the protein is immunoaccessible on GBS bacteria. 

5 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1885 

A DNA sequence (GBSxl993) was identified in S.agalactiae <SEQ ID 5863> which encodes the amino 
acid sequence <SEQ ID 5864>. This protein is predicted to be abortive infection bacteriophage resistance 
10 protein (abiEi). Analysis of this protein sequence reveals the following: 

Possible site: 31 

>» Seems to have no N-terminal signal sequence 

Final Results 

15 bacterial cytoplasm Certainty=0 . 2765 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 993 1> which encodes amino acid sequence <SEQ ID 9932> 
20 was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB52382 GB:U36837 AbiEi [Lactococcus lactis] 
Identities = 51/206 (24%) , Positives = 90/206 (42%) , Gaps = 23/206 (11%) 



Query: 


17 


KNNGIVTNKDCKALGIPTIYLTRLEKEGI I FRVEKGI FLTQNGDYDEYYFFQYRFPKAIF 7 6 






KG+K + G1 YL+ + + V+KG+++ + D + FQ ++ KA+ 




Sbjct: 


76 


KYKGNIIRKIVRDEGISDYYLRKFVLKYNLTEVDKGVYIFPHKKKDSLFIFQQKYSKAVI 


135 


Query: 


77 


SYISALYLQQFTDEIPQYFDVTVPRGYRF NTPPANIiNI 


114 






S+ ++LYLQ D IPQ ++VP Y N N+ I 




Sb j ct : 


136 


SHETSLYLQDVIDYIPQKIQMSVPEKYNISRIQEPHENRLTSYNYVDINSNNIMDKNIPI 


195 


Query: 


115 


HFV- SKEYSELGMT WPTPMGNNVRVYDFERI ICDFVIHREKIDSELFVKTLQSYGNYPK 


173 






+ V +K S + TV + +G +RV RID+ K + E+ +++Y 




Sb j ct : 


196 


NLVRNKSISPTQIETVNSFLGLPLRVTSIARSIVDVLKPSHKAEEEVKEQAIKYYLERFP 


255 


Query: 


174 


KNLAKLYEYATKMNTLEKVKQTLEVL 199 








N+ +L A N L++++ L +L 




Sbjct: 


256 


DNI VRLKRI AKTQNVLKELEYYL I LL 281 





40 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1886 

45 A DNA sequence (GBSxl994) was identified in S.agalactiae <SEQ ID 5865> which encodes the amino 
acid sequence <SEQ ID 5866>. This protein is predicted to be abortive infection bacteriophage resistance 
protein (abiEii). Analysis of this protein sequence reveals the following: 

Possible site: 43 

»> Seems to have no N-terminal signal sequence 
50 INTEGRAL Likelihood = -1.12 Transmembrane 260 - 276 ( 259 - 277) 
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Final Results 

bacterial membrane Certainty=0 . 1447 (Affirmative) ' < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB52383 GB:U36837 AbiEii [Lactococcus lactis] 
Identities = 76/276 (27%) , Positives = 135/276 (48%) , Gaps = 19/276 (6%) 

Query: 14 SKNTGLTFNSVMTYYFLEVir,KmSQSSYSNHYIFKGGFLLSNVIGVESRSTvDIDFLFH 73 

++ N + + Y E L +LS S Y ++ KGGFL+ + R+T D+D 

Sbjct: 12 TRNDDIGIENYRIRYATERFLTRLSASQYKEKFVLKGGFLIGVTYNLSQRTTKDLDTALI 71 



15 Query: 74 QITLSEETVKQQLKEIL-ADSEEGISFVIQSITTIKESDDYGGYRATISCQLE--NIKQV 130 

+++++ + EI D E+ + F ++ +T+ ++ Y GYRA + N + 

Sbjct: 72 DFKSDAQSIERVITEICNIDLEDQVLFKLKELTSSQDMRIYPGYRAKLKMMFPDGNTRID 131 

Query: 131 IHLDIATGDWTPQPITYDYKAIFDE DNFPI IAYTIETIIAEKLQTI YSRNFLNS 185 

20 LDI GD +TP+ IF+E ++AY ETI AEKL+TI +R +N+ 

Sbjct: 132 FDLDIGVGDRITPFAKKIKIPLIFNEVKGVEKQIEVXAYPKETIQAEKLETILTRGKVNT 191 

Query: 186 RSKDFYDVYIli- -SKLKKKDIDFNQLKNACQRTFSYRE-TELDFEKIIE LLERFK 237 

R KD+YD ++L + IF A + T+ +R T+ E++ E L E + 

25 Sbjct: 192 RMKDYYDFHLLLTDQENSNS I S FYY AFKNTWEFRNPTQFIDEELFEDWLFILDEILE 248 

Query: 238 SDPTQNQQWQNYSKKYSYTKGISLANVLDEMISLIT 273 

S ++WNYK +YK +++ +++ E+ ++ 
Sbjct: 249 SKELKEKYWPNYIKDRNYAKHLNMDDIISEIKEFVS 284 

30 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1887 

35 A DNA sequence (GBSxl995) was identified in S.agalactiae <SEQ ID 5867> which encodes the amino 
acid sequence <SEQ ID 5868>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

>>> Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0 . 1137 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1888 

50 A DNA sequence (GBSxl996) was identified in S.agalactiae <SEQ ID 5869> which encodes the amino 
acid sequence <SEQ ID 5870>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

>» Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm' Certainty=0. 2 7 82 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Hot Clear) < suco 

5 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
10 vaccines or diagnostics. 

Example 1889 

A DNA sequence (GBSxl997) was identified in S.agalactiae <SEQ ID 5871> which encodes the amino 
acid sequence <SEQ ID 5872>. Analysis of this protein sequence reveals the following: 

Possible site: 21 
15 >>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood =-10.14 Transmembrane 310 - 326 ( 301 - 334) 

Final Results 

bacterial membrane Certainty=0 . 5055 (Affirmative) < suco 

20 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG38044 GB:AF295925 Orf28 [Streptococcus pneumoniae] 
25 Identities = 272/344 (79%) , Positives = 307/344 (89%) 

Query: 568 VYVNPAFYFPKVIQVCTTILPTIGQFGGDEFERAKAIYDYLKSKGATNQAIAAILGNWSV 627 

+YVNP FYFPKVIQ+QTTILP IGQFGGDEFERAK IY++LKS+GA+ QAIAAILGNWSV 
Sbjct: 1 MYVNPQFYFPKVIQLQTTILPAIGQFGGDEFERAKHIYEFLKSQGASPQAIAftlLGNWSV 60 

30 

Query: 628 ESSINPKRAEGDYLSPPVGATDSSWDDEGWLTLNGPTIYNGRYPNILKRGLGLGQWrDTA 687 

ESSINPKRAEGDYL+PPVG WDDE WL + GP IY+G YPNIL RGLGLGQWTDTA 

Sbjct: 61 ESSINPKRAEGDYLTPPVGVPIPPWDDESWLAIGGPAIYSGAYPNILHRGLGLGQWTDTA 120 

35 Query: 688 DGSRRHTLLLEYAKGKHQKWYDLGLQLDFMLYGDSPYYTNWLKDFFKNSGSPASLAQLFL 747 

DGS RHT LL YA+ +++KWYDL LQLDFML+GDSPYY +WLKDFFKN+GS A+LAQLFL 
Sbjct: 121 DGSTRHTALLNYARTQNKKWyDLDLQLDFMLHGDSPYYQSWLKl)FFKlTrGSAANLAQLFL 180 

Query: 748 IYWEGNSGDKLLERQTRASEWYYQIEKGFSQPNGGTAQSDPKALEAVREDLFENSIPGGG 807 
40 YWEGNSGDKLLERQTRA+EWYYQIEKGFSQ NGG A+SDP++LE VR DL+++S+PGGG 

Sbjct: 181 TYWEGNSGDKLLERQTRATEWYYQIEKGFSQTNGGQAKSDPQSLEGVRGDLYDHSVPGGG 240 

Query: 808 DGMGYAYGQCTWGVAARINQLGLKLKGKNGEKIPUSTMGNGQDWVRTAASLGGETGTSP 867 
DGM YAYGQCTWGVARR+NQLGLKLKG+NGEKI I I +TMGNGQDWV T++SLGGETG++P 
45 Sbjct: 241 DGMAYAYGQCTWGVAARMNQLGLKLKGRNGEKI S 1 INTMGNGQDWVATSSSLGGETGSTP 300 

Query: 868 QEGAILSFAGGGHGTPTEYGHVAFVEKVYPDGSFLISETNYNGN 911 

+ GAI+SF GG HGTP YGHVAFVEKVY DGSFL+SETNY GN 
Sbjct: 301 RAGAIVSFVGGTHGTPASYGHVAFVEKVYDDGSFLVSETNYGGN 344 

50 

SEQ ID 5872 (GBS74d) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 121 (lane 3 & 4; MW 95.5kDa). It was also expressed in E.coli as a His-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 121 (lane 5-7; MW 70.5kDa) and in 
Figure 179 (lane 9; MW 70.5kDa). 

55 GBS74d-His was purified as shown in Figure 233, lane 7-8. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1890 

A DNA sequence (GBSxl998) was identified in S.agalactiae <SEQ ID 5873> which encodes the amino 
acid sequence <SEQ ID 5874>. This protein is predicted to be TrsE-like protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 55 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 5526 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG38042 GB:AF295925 Orf26 [Streptococcus pneumoniae] 
Identities = 618/782 (79%) , Positives = 712/782 (91%) , Gaps = 1/782 (0%) 



Query: 


1 


MKICbKHbiYIKSK-TSSNDK^ 


59 






MK+ +++K + TS+ +KK++ K +K+E+ PST NTL+YQ L+QNGLMQV YFSQ+YL 




Sb j ct : 


3 


MKRKSOTLKKQQTSTTNKKEEVKDKKEEVLPSTAm'LSYQALYQNGLMQVKEDYFSQSYL 


62 


Query: 


60 


t m rTiT\7/^ r TT Tr~* T T\T\TS/~* Tl TinPWOnT TTvTGT" T*lT"lVTOlTC/"\T <T*T CTvT/'M^l TVTT T?yDDVOTT VT5T /"M?TM""* 

LGDVNYQIVGLDDKGAIVkKYblJIjINSIjUDKiWl'^ KKklJ-iYPJ-iQEDU 


119 






LGDVNYQTVGL+DKGAI +EKYSDLI SLDD+TNFQLTIFN+++NLEKFR S+LY +EDG 




Sb j ct : 


63 


LGDVNYQTVGLEDKGAIIEKYSDLIKSLDDQTNFQLTIFNKRIjNLEKFRHSVLYEEKEDG 


122 


Query: 


120 


FDTYRDELNRMMDANLEAGENNFSAVKFLSFGKSDQTPKLAFRSLSQIGEYFKSGFSEID 


179 






+D+YR ELNRMM+ NL++GENNFSAVK +SFG+ D PK A+RSLSQIGEYFKSGFSEID 




Sb j ct : 


123 


YDSYRKELNRMMNQNLDSGENNFSAVKLISFGRKDSNPKQAYRSLSQIGEYFKSGFSEID 


182 


Query: 


180 


VSLGLLGGEERVNVLADMLRGENHLPFSYKDLTLSGQSTKHFIAPTYLSFKHKNHIELDD 


239 






L GEERVN+LADMLRGE+HLPFSY+DLT SGQ+T+HFIAP L FK+KN+++++D 




Sb j ct : 


183 


ARFESIAGEERVNLLADMLRGEHHLPFSYRDLTRSGQTTRHFIAPNLLDFKNKNYLQIND 


242 


Query: 


240 


RLLQIVYVRDYGMELGDKFIRDLMQSDLEVMISLHAKGSTKSETMTKLRTKKTLMESQKI 


299 






RLLQIVYVRDYGMELGD+FIRDLMQ DLE+++SLHA+ STKS+ M KLRTKKTLMESQKI 




Sb j ct : 


243 


RLLQIVYVRDYGMELGDQFIRDLMQGDLEIjIVSIiHAQSSTKSDAMKKLRTKKTLMESQKI 


302 


Query: 


300 


GEQQKMARTGIYLEKVGHVLENNIDEAEALLQTMTQTGDKLFDTVFLIGVLADTEDQLKQ 


359 






GEQQK+ARTGIYLEKVGHVLE+NIDEAE LL+TMT+TGDKLF TVFLIGV E++LKQ 




Sb j ct : 


303 


GEQQKLARTGIYLEKVGHVLESNIDEAEELLKTMTETGDKLFQTVFLIGVFGQDEEELKQ 


362 


Query: 


360 


SLDIIKQVAGSNDMIIDNLTYMQEAAFNSLLPFGKNYLEGVSRSLLTSNIAVNAPWTSVD 


419 






+LD ++QVAGSND++ID L YMQEAAFNSLLPFG ++LEGVSRSLLTSNIAVN+PWTSVD 




Sbjct: 


363 


ALDTVQQVAGSNDLMIDKLPYMQEAAFNSLLPFGCDFLEGVSRSLLTSNIAVNSPWTSVD 


422 


Query: 


420 


IHDKGGKFYGINQISSNIISIDRGKLNTPSGLILGTSGAGKGMATKHEIISTKLKEADSD 


479 






+ D+ GK+YGINQISSNII+IDR LNTPSGLILGTSGAGKGMATKHE 1 1 +TK+ KE+ + 




Sb j ct : 


423 


LQDRSGKYYGINQISSNIITIDRSLLNTPSGLILGTSGAGKGMATKHEIITTKIKESGEN 


482 


Query: 


480 


TEIIIVDPENEYSIIGQAFGGESIDIAPDSTTFIiNVLELSDENMDEDPVKVKSEFLLSWI 


539 






TEIIIVDPE EYS+IG+ FGGE IDIAPDS T+LNVL+LS+ENMDEDPVKVKSEFLLS+I 




Sbjct: 


483 


TEIIIVDPEAEYSVIGRTFGGEMIDIAPDSETYIiNvLDLSEENMDEDPVKVKSEFLLSFI 


542 


Query: 


540 


GKLLDRKM3GREKSLIDRVTRLTYKHFDTPSLVEWFVLSQQPEQEAKDIiALDMELYVEG 


599 






GKLLDRKMDGREKS+ 1 DRVTRLTY+ F PSL EWVFVLSQQPE+EA++LALDMELYVEG 




Sb j ct : 


543 


GKLLDRKMDGREKSIIDRvTRLTYQSFKEPSLEEWVFVLSQQPEEEAQNLALDMELYVEG 


602 



Query: 600 SLDIFSHRTNIKTDSHFLIYOTKiaiiGDELKQIAIiMVIFDQIWNRVvKNQKLGKKTWIYFD 659 

SLDIFSH+TNI+T S+FLIYNVKKLGDELKQIAUW+FDQIWNRW+NQKLGKICrWIYFD 
Sbjct: 603 SLDIFSHKTNIQTGSNFLIYNVKKLGDELKX3IALMWFDQIWNRWRNQKLGKKTWIYFD 662 
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Query: 660 EMQLLLLDKYASDFFFKLWSRVRKYGAIPTGITQNVETLLLDANGRRIIANSEFMILLKQ 719 

E++LLLLDKY SDFFFKLWSRVRKYGA PTGITQNVETLLLD NGRRI IANSEFMILLKQ 
Sbjct: 663 EIELLLLDKYPSDFFFKLWSRWKYGASPTGITQNVETLLLDPNGRRIIANSEFMILLKQ 722 

5 Query: 720 AKSDREELVHMLGLSKELEKYLVNPEKGAGLIKAGSTWPFKNKIPQHTKLFDIMSTDPE 779 

AK+DREELV +LGLSKELEKYLVNPEKGAGLIKAGS WPFKNKIPQ ++LFDIM +DP+ 
Sbjct: 723 AKNDREELVQLLGLSKELEKYLVNPEKGAGLIKAGSVWPFKNKIPQGSQLFDIMRSDPD 782 

Query: 780 KM 781 
10 KM 

Sbjct: 783 KM 784 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccines or diagnostics. 

A related GBS gene <SEQ ID 8925> and protein <SEQ ID 8926> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
McG: Discrim Score: -26.26 
20 GvH: Signal Score (-7.5): -3.87 

Possible site: 55 
»> Seems to have no N-terminal signal sequence 
ALOM program count: 0 value: 6.26 threshold: 0.0 
PERIPHERAL Likelihood = 6.26 335 
25 modified ALOM score: -1.75 

*** Reasoning Step: 3 

Final Results 

30 bacterial cytoplasm Certainty=0 . 5526 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) 

The protein has homology with the following sequences in the databases: 

35 33.5/57.2% over 789aa 

Enterococcus faecalis 
GP | 8100663 I TrsE-like protein Insert characterized 

ORF01332(319 - 2628 of 2949) 
40 GP)8100663|gb|AAF72347.l|AF192329_8|AF192329(2 - 791 of 799) TrsE-like protein 

{Enterococcus faecalis 

' } 

%Match =20.7 

%Identity =33.4 %Similarity =57.2 
45 Matches = 259 Mismatches = 323 Conservative Sub.s = 184 

210 240 270 300 330 360 387 
SCYLGSIAPTIYHLKYTSSWFIMN*RCQTAHLLEEKETNvKKLKHSMKSKTSSNDKKQKTKTQKQEI S 

II :| : |: I = = 1 I 

50 MSKKEIPRETEKTKLTRAQRKEIDAVIRKYKGDGR 

10 20 30 



55 



414 444 474 504 534 564 594 624 

PSTVN-TIAYQGLFQNGLMQVSPSYFSQTYLLGDVNYQTVGLDDKGAIVEKYSDLINSLDDKTNFQLTIFNQKVNLEK^ 

II =: 1= := =1= =111 11= = l==ll I = II II II I =1 = l== =1=11= == 

PHTAQQSIPYEVMYPDGVCRVSPGvFSKCIEFADISYQLAQPDTQTAIFEKLCDLYNYvDASIHIQFSFLmKvDPVQYA 
50 60 70 80 90 100 no 



654 684 714 744 774 804 834 864 

60 KSILYPLQEDGFDTYRDELNRMMDANLFAGENNFSAVKFLSFGKB^ 

II I I II I I == III hhl =: I I I =11 = I : :: | 

KSFEIAPQGDDFDDIRAEYTGILQKQLANGlSnSIGMVKTKYLTFTIF 
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130 140 ISO 160 170 180 190 

894 921 951 981 1011 1041 1071 1101 

ERVNVLADMLRGENHL - PFSYKDLTLSGQSTKHFIAPTO^ 



ERLNLLHGVYHPDGE I FNFDWKWLAPSGLSTKDF IAPSSLCFGNAKTFGMGGKYGAVS FLQI LS PELSDDMLADFLNTES 
210 220 230 240 250 260 270 

1131 1161 1191 1221 1251 1281 1311 1341 

EVMISLHAKGSTKSETMTKbRTKKTLMESQKIGEQQKMARTGIYL^ 

|:::|| : :::::: | | ::: || ||:| |:| = = : | >:|: II : ::|| Ih 

GVLVNLHVQAIEQTKAIKTIKRKITDLDAMKIAEQKKATOSGYDMDILPSDIATYGEDAKKLLTKLQTRNERLFQLTFLV 
290 300 310 320 330 340 350 

1371 1401 1431 1461 1491 1521 1551 

GVLADTEDQLKQSLDIIKQVAGSIfflMIIDl^TYMQFJUiFNSLLPFGKISryLEGVSRSLLTSNIAWAPWTSVDIHDKGGK- 

:|||: :| : || :: : I I II = I 11=1 I = =111 ll==ll 1= = == I 

IOTADTKQm^VFQAAGVAQKHNCPLTOLDYQQEQGIASSLPLGVNQI-KIQRSLTTSSVAVFVPFVTQELFQGGAAM 

370 380 390 400 410 420 430 

1608 1638 1668 1698 1728 1758 1788 1818 

FYGINQISSNIISIDRGKLNTPSGLILGTSGAGKGMATKHEIISTKLKEADSDTEIIIVDPENEYSIIGQAFGGESIDIA 

=1111 I 1=1 =11 = 1= I III 1=11 1 = I 11=1 I I =i I III II = = = 1= I == 
YYGINAKSRNMIMLDRKQARCPNALKLGTPGSGKSMSCKSEIVSVFLTTPD DIFISDPEAEYYPLVKRLHGQVTRLS 

450 460 470 480 490 '500 510 

1848 1875 1905 1935 1959 1989 2019 

PDSTTFLNVLELS-DE1MDEDPVKVKSEFLLSWIGKLLDRK--MIX3REKSLIDRVTRLTYKHFDTPSLVEWVFVLS 

I I |=| |=== = = |::|: =11=1=11= == I == ll==lll 1= 1= = I = =11 

PTSKDFVNPLDINLNYSEDDNPLALKSDFVLSFCELVMGGKNGLEAIEKWIDRAVRVIYRPYLADPRPElsMPILSDLHK 

530 540 550 560 570 580 590 

2058 2088 2118 2148 2178 2208 2238 2268 
QQPEQEAKDLALD^LYVEGSLDIFSHRTNIKTDSHFLIYNVKKL^^ 

i ii =i = = in iii = = i = iiii= = == = = = i = n =ii = = = = = = mi ii i= n ii 

ALLDQHVPEADRVAQALDLYVSGSlWFiraRTi™ 

610 620 630 640 650 660 670 

2298 2328 2358 2388 2418 2448 2478 2508 

YFDEMQLliLLDKYASDFFFKLWSRWKYGAIPTGITQNVETLLLDANGRRIIANSEFMILLKQAKSDREELVHMLGLSKE 

= II =111 == = = : = ) 1 11 = 1 till 1111= II 1= 11 = 1= n ii in I 1 II 1 

FADEFHLLLKEEQTAAYSAEIWKRFRKWGGIPTGATQNVKDLLSSPEIENILENSDFITLLNQASGDRKILAERLNLSTE 
690 700 710 720 730 740 750 

2538 2568 2598 2628 2658 2688 2718 2748 

LEKYLWPEKGAGLIKAGSTWPFKNKIPQHTKLFDIMSTDPEJ^T*DERG*KASQTG*AKLSKQLKISSYALSERS*D 

= n= i i i n= = i = n i n = = i = i= n = i = 

QQKYIDNSEPGEGLLI FENWLPFTNPI PHNTQLYKIMTTRLNEVAGV 
770 780 790 

A related GBS gene <SEQ ID 8927> and protein <SEQ ID 8928> were also identified. Analysis of 
protein sequence reveals the following: 

This protein might be involved in vancomycin resistance 

The protein has homology with the following sequences in the databases: 

>GP|8100663|gb|AAF72347.l|AF192329_8|AF192329 TrsE-like protein 
{Enterococcus faecalis} 

Score = 427 bits (1086) , Expect = e-118 

Identities = 257/785 (32%) , Positives = 431/785 (54%) , Gaps = 28/785 (3%) 

Query: 9 DKKQKTKTQKQE I S PSTVN-TIAYQGLFQNGLMQVSPSYFSQTYLLGDV 56 

+K + T+ Q++EI P T ++ Y+ ++ +G+ +VSP FS+ D+ 

Sbjct: 11 EKTKLTRAQRKEIDAVIRKYKGDGRPHTAQQSIPYEVMYPDGVCRVSPGVFSKCIEFADI 70 



Query: 57 NYQWGLDDKGAIWKYSDLINSLDDKraFQLTIFNQKVNLEKFRKSILYPLQEDGFDTY 116 



WO 02/34771 



PCT/GB01/04789 



-2139- 



+YQ D + AI EK DL N +D + Q + N+KV+ ++ KS Q D FD 



OJJJ (— L. . 


i x 




130 


Query: 


117 


RDELjNTRMMDANLEAGEI<rNFSAVKFLSFGKSDQTPKj^FRSLSQIG 


176 






RE ++ L G N K+L+F ++ K A L +IG F + 




Sb j ct : 


131 


R AEYTGT T iOKOT i ANONWGMVKTKYLTFT I EAE S VKAARARLKRIGFDLLGYFKSMGAVAH 

XVJTiX— 1 X X \3 X XJ^XV^XXT^LNvJXVXNVJl 1 ! V XX J. XV X J— 1 X X X X J—ajTIXj O V I VO-£j-XvjT1-1.\ J-4 X VX V X Xjj-iJ-i\J jl. J. ivi^n\jj7i V X*X X 


190 


Query: 


177 


LLGGEERV1.TOADMLRGENHL-PFSYKDLTLSGQSTKHFIAPTYLSFKHKNHIELDDRLL 


235 






++ G ER+N+L + + + F +K L SG STK FIAP+ L F + + + 




Sb j Gt : 


191 


V T¥«Pif\ iiimi 1 1 f n \ -r y ^ XX iTi—' VjX_j X. X7 XNX7 XJ VV XVrY XXC1X7 UwXJu X XvXJL X JT1X7 LJUXJui VJXNjTiXV X X urluuUXu 


250 


Query: 


236 


QIVYVRDYGMELGDKFIRDLMQSDLEVMI^ 


295 






4- 4-4-4- F.T. n 4- D 4- +4- V4-4-4-LH 4- 4-4-4- 4- 4-4- K T 4-4-4- KI EO 

~ ~ l T AXlXJ X-/ I X-/ 1 1 1 V TTTiUX T 1 1 I 1 1 I XX X 1 I I J.V4i J— I ^ 




OJJJ L. • 


*?m 
^ji 


jfiv O X7 XJ^X JjOxrxxiJJOJJJJl v lJ_iriJJr J_ixN X xxiOVs v XJ VX\I_ixl V^lnl£iUl xvrLXxvX X xvxvxvX X xjxj UsriX 1 ! XV X jrixxii,^ 


310 


Query: 


296 


QKIVL^TGIYLEKVGIx^ 


355 






4-ff R4.f4" 4-4- 4- T. 4-4-A4- T.T. 4- 4-4.T1F FTi4- 4.ADT4- 4-Ij 4- 

TXV I\.TU TT T XJ TTJTlT XJXJ T TTXJX7 X7 XJT TilWlT TXJ T 




QVin r*\- • 
oJJJ t- U . 


"3. 1 1 


TfTf A^7PQnvnMnTT,PQriT.ATVnFnATCVTT^^ 

xVJvtt.VKov3xxJx w ilJX xJJr\3xJxxtt.x x v3Ej]Jj^ir\xvJJXxx JT\JUVJ.X JXW&xXljJ? yxj J- r xg VXxLn VJ-lu x xvyrU-iiyXNXJ v r 


370 


Query: 


356 


IIKQVAGSITOMIIDNLTYMQEAAFNSLLPFGKOT 


415 






VA ++ + Ti Y OE S LP G N ++ + RSL TS++AV P+ + ++ 

VXT. TT T XJ X l - } XJXT V3 XI 1 J T XVOXJ J.UT ~JT>. v XT 1 1 1 1 




oJjj cc : 


■DTI 
J /X 


yjTittvsVAyi^j^Nv-JrJj VK^ xOOV.tt.vr vxrr V lyjixjry 




Query: 


416 


KGGK-FYGINQISSNIISIDRGKL^ 


474 






\J TIUllM O X\TX TXJXX T FT XJ J_lX3 X VJJTVJXX 1*1T X\. XJ XJ T 




OJJJ Ct . 


-3 u 


uuAAl'ix xv7XxvjHj\oK±W1X1 v i1jijk VDvrui x xrxJ xj 




Query: 


475 


IIIVDPElSEYSIIGQAFGGESIDIAPDSxTFl^ 


533 






T T TiPP PV 4. 4. d4- T 4-4.P P4-N T14.4-4- 4- 4- TJ4-4-P4- 4-KS4-F4-LS+ 

X X XJxTXli XLl X T T VJT^ X TTT Cj IT ^X\ XJTTT " X/TTrT TIUJT1 TXJUT 




OJJJ CU . 


^to / 


lr XoLJj^rli/^iL x x irxj VlVjKljxxoy V XIXxjO ir 1 OixiJr vx\ JrxjJJXl\xxL^J x OxxixJUlN xrlxttXjJ\.OxJr VXJOr 


OfiD 


Query: 


534 


KLLDRK--MDGREKSLIDRVTRLTYKHF B^Sl^JEil^VFVhSQQ^QEI'J^LJ^ 


584 






4.4. K 4-4- "RK4-4.TDR R+ Y+ 4- 4. p Tj + L O EA 4-A 

1 1 XV TT IjXVTTXX-'XV IvT X t 1 T AT XJ T XJ \£ WT* « 




OJJJ v-L. * 


34 / 


T \m^nT^Tf^T 1 PATT?Tf'T 1 \7'Tr)RA'\?T?'\/'T"V"P PYT.ADPTJPFNTVfPTT ,QnT.T4:TrATiT.T)nrT\7PFAT)P. VAD 
JLi Vl v l\JvjrvLNvjl.)Oi^iX xIjIV 1 V X JJxvrt. v rv vl x rvxr x Xx^ii^xrxvxrxlil , Jl*lir XXJOlJXjflJXjf-ixJJJX^Sit* 1 V * JZ»jriJJxv. VjC-V^ 


606 


Query: 


585 


DMELYVEGSLDIFSHRTNIKTDSHFLIYIWKKIjGDELKQIAIjiW 


644 






++LYV G55L++F+HRTN+ + + ++ + K+LG +LK++ ++++ DOIW RV N+ G 

T TXJ X V vUUTT X TXlxV XXN l I T TTTI\TXnJ TXJIV T I ~ r T T xy J- l"f XV V XN T 




OJJJ L- . 


o u / 


AT.nT.VVQnGTiftrtnrWRPT'TXrc 

M I M J I 1 x V O'dOXJxN V I? lNxxxvllN V xJ X ^jIN xvxj V O X7 iJ X xvCjXJVJxvyxj xvTvXjVsj'iXJ X V^JjyX rvvjxv V XJru\x\0^o 


666 


Query: 


645 


KKTOIYFDEMQLLLLDKYASDFFFK^ 


704 






K TW + DE LLL 4-4- +4- 4-+W R RK+G IPTG TQNV+ LL 1+ NS 




OJJJ Ct. . 


oo / 


ATTaTVPATiFFT-TT .T,T .KTr'n'riTA AVG AF TWTTR FRTnflf^TPTnATnTJ\7TTnT .T.ClQPFTF'NrTT.FNC! 
xvttl ri x x7jrixJEiX7 n I 11 1 i 1 1% Pi ri^j lJru-ix orixli X W fvxv r lAivrV^jAjrX xr 1 Kss-i. x yxM V x\X/xjxj O O xr xxi X xxilN XXjxxU.NO 


726 


Query: 


705 


EFMILLKQAKSDREELVHMLGLSKELEKYLVNPEKGAGLIKAGSTWPFKNKIPQHTKLF 


764 






+F+ LL QA DR+ L L LS E +KY+ N E G GL+ + V+PF N IP +T+L+ 




Sb j ct : 


727 


DFITLLNQASGDRKILAERIil^STEQQ^IDNSEPGEGLLIFFJ^WLPFTOPIPHNTQLY 


786 


Query: 


765 


DIMST 769 








IM+T 




Sb j ct : 


787 


KIMTT 791 





SEQ ID 8926 (GBS75) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 17 (lane 11; MW 89.8kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 20 (lane 6; MW 114.7kDa). 

GBS75-GST was purified as shown in Figure 197, lane 8. 

GBS329 was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown 
in Figure 77 (lane 8; MW 89kDa). It was also expressed in E.coli as a GST-fusion product. SDS-PAGE 
analysis of total cell extract is shown in Figure 174 (lane 2; MW 1 14kDa). 

GBS329-GST was purified as shown in Figure 220, lanes 9 & 10. 
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Example 1891 

A DNA sequence (GBSxl999) was identified in S.agalactiae <SEQ ID 5875> which encodes the amino 
acid sequence <SEQ ID 5876>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 2442 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=Q . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1892 

A DNA sequence (GBSx2000) was identified in S.agalactiae <SEQ ID 5877> which encodes the amino 
acid sequence <SEQ ID 5878>. This protein is predicted to be DNA-directed RNA polymerase ii largest 
subunit. Analysis of this protein sequence reveals the following: 
Possible site: 21 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 4393 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1893 

A DNA sequence (GBSx2001) was identified in S.agalactiae <SEQ ID 5879> which encodes the amino 
acid sequence <SEQ ID 5880>. Analysis of this protein sequence reveals the following: 



Possible site 


: 13 




















>>> Seems to 


have no N-terminal signal sequence 












INTEGRAL 


Likelihood 




-9. 


92 


Transmembrane 


256 


- 272 


( 


250 


- 277) 


INTEGRAL 


Likelihood 




-8.28 


Transmembrane 


216 


- 232 


( 


213 


- 244) 


INTEGRAL 


Likelihood 




-8. 


12 


Transmembrane 


151 


- 167 


( 


148 


- 191) 


INTEGRAL 


Likelihood 




-7. 


27 


Transmembrane 


57 


- 73 


( 


54 


- 80) 


INTEGRAL 


Likelihood 




-6. 


74 


Transmembrane 


93 


- 109 


( 


88 


- Ill) 


INTEGRAL 


Likelihood 




-3. 


50 


Transmembrane 


172 


- 188 


( 


168 


- 191) 


INTEGRAL 


Likelihood 




-2. 


76 


Transmembrane 


113 


- 129 


( 


110 


- 130) 



Final Results 

bacterial membrane Certainty=0 .4970 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 
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>GP:AAG38039 GB:AF295925 Orf23 [Streptococcus pneumoniae] 
Identities = 71/86' (82%) ,- Positives = 83/86 (95%) 

Query: 37 VKSLADENPTWSYMTAITKGIMQPLGVAILAv^vlEFSKMAKKi™SGGAMTFEAIAP 96 
5 +KSL+ +NPTVW+YM++ITK +MQPLGVAIL+WL+LEFSKMAKKIANSGGAMTFEA+AP 

Sbjct: 1 MKSLSSYNPTVWYMSSITKSVMQPLGVAIIiSWIjlljEFSKi^KKIJ^SGGMTFFALAP 60 

Query: 97 MI VSYIIWAWITNTTVI VEAI IAIA 122 
M++SYIMVAWITNTTVI VEAI I IA 
10 Sbjct: 61 MLISYIMVAWITNTTVIVEAIIGIA 86 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could he useful antigens for 
vaccines or diagnostics. 

15 Example 1894 

A DNA sequence (GBSx2002) was identified in S.agalactiae <SEQ ID 5881> which encodes the amino 
acid sequence <SEQ ID 5882>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

»> Seems to have no N-terminal signal sequence 
20 INTEGRAL Likelihood = -7.54 Transmembrane 32 - 48 ( 25 - 52) 

INTEGRAL Likelihood = -4.09 Transmembrane 63 - 79 ( 62 - 80) 

Final Results 

bacterial membrane Certainty=0. 4015 (Affirmative) < suco 

25 bacterial outside — Certainty=0.0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9933> which encodes amino acid sequence <SEQ ID 9934> 
was also identified. A related GBS nucleic acid sequence <SEQ ID 10777> which encodes amino acid 
30 sequence <SEQ ID 10778> was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

35 Example 1895 

A DNA sequence (GBSx2003) was identified in S.agalactiae <SEQ ID 5883> which encodes the amino 
acid sequence <SEQ ID 5884>. This protein is predicted to be TrsK-like protein (traK). Analysis of this 
protein sequence reveals the following: 

Possible site: 34 
40 >>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -7.38 Transmembrane 66 - 82 ( 62 - 85) 

Final Results 

bacterial membrane Certainty=0 .3951 (Affirmative) < suco 

45 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG38037 GB:AF295925 0rf21 [Streptococcus pneumoniae] 
50 Identities = 343/457 (75%) , Positives = 385/457 (84%) , Gaps = 24/457 (5%) 
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10 



15 +LE+LFE+YAKKYG ENFTMRNWADFQNYKDKTLDSVIAVTTAKFALFNIQSV+DLT+RD 



20 



25 



30 



50 



Query: 


142 


Sbjct: 


1 


Query: 


202 


Sb j ct : 


61 


Query: 


262 


Sb j ct : 


121 


Query: 


301 


Sb j ct : 


180 


Query: 


361 


Sbjct: 


240 


Query: 


419 


Sb j ct : 


300 


Query: 


479 


Sb j ct : 


360 


Query: 


539 


Sb j ct : 


420 
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+ VIGGSG+GKTFRFVKPNLIQ+N SNIVVDPKDHIiAEKTGKLFLE+GYQVKVLDLvNM 



NSDGFNPFRY+ETFJ5IDLNRML VYFNNTKG+GSRSDPFWDEASMTLVRA-hASYLVDFYNP 



p + K+E E R+KRGR F E 4- + + KS 



T+D+KTWG +K+MVYLVIPDND+TFRFLSAL FF+ FT + + + + +LP+HVR 



YLDEFAN+GEIPDFAEQTSTVRSRNMSLVPILQNIAQLQGLYKEKEA.WKTILGNCDSL+Y 



LGGNDE+TFKFMSGLLGKQT+DVR+TSRSFGQTGS S SHQKIARDLMT DEVG MKR E 



CLVRIA +PVF++KKY KH +WK LA++ETD+R W 



No corresponding DNA sequence was identified in S.pyogenes. 

35 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

A related GBS gene <SEQ ID 8929> and protein <SEQ ID 8930> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 5 
40 McG: Discrim Score: 5.53 

GvH: Signal Score (-7.5): -0.78 

Possible site: 34 
»> Seems to have a cleavable N-term signal seq. 
ALOM program count: 1 value: -7.38 threshold: 0.0 
45 INTEGRAL Likelihood = -7.38 Transmembrane 66 - 82 ( 62 - 85) 

PERIPHERAL Likelihood = 1.75 338 
-modified ALOM score: 1.98 



*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0. 3951 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



55 The protein has homology with the following sequences in the databases: 

33.9/50.9% over 419aa 



60 



Lactococcus lactis 
GP | 3582206 | trsK protein (traK) Insert characterized 

PIR|T43089jT43089 transfer complex protein TrsK - plasmid pMRCOl Insert characterized 
ORF00383(715 - 2004 of 2415) 
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GP|3582206|gb|AAC56002.l| |AE001272(23 - 442 of 530) trsK protein (traK) {Lactococcus 
lactis}PIR|T43089|T43089 transfer complex prote 
in TrsK - Lactococcus lactis plasmid pMRCOl 
%Match =10.1 
5 %Identity =33.8 %Similarity =50.8 

Matches = 141 Mismatches = 193 Conservative Sub.s = 71 

519 549 579 609 639 669 699 729 

SFLAFILGVLMMTLVYLWSTGQKVYREGEEYGSARFGTSKEKRNFYSKNPFNDT^ 
10 I : |:|:: 

MNGTILGVLDNKIIYQDNTTKPNRNVM 
10 20 

759 789 816 846 876 906 936 966 

1 5 VIGKSGAGKTFRFVKPNLIQLNCSNIW-DPKDHLAEKTG 

lllllh II I II -III III I III = I Ihl |::: II =11 =111 hi = 

VIGGSGSYKTQSWITNLFNETKNSIWTDPKGELYEKTAGIKIAQGY^ 

40 50 60 70 80 90 100 

20 . 996 1026 1056 1086 1116 1146 1176 1194 

TVYFNIOTKGNGSRSDPFWDEASMTLVRAIASYLVDFYNPPGSSKQEQEMIRKRGRYPAFSEIGKLIKLLSKGD NQD 



25 



TKI VQSENAEGKK- -DVWFSTQRQLLKALILFVM KERSPEQRNLAGVINVLQTFDSEPINKD 

120 130 140 150 160 



1221 1251 1281 1311 1341 1371 1401 1431 

K-SILEVLFEDYAKKYGHENFTMR1WADFQNYKDKTLDSVIAV^ 

: | |: || I I I I : I 1 = 1= I = I = I = = l 1= I =1 -I- 

ENSDLDNLF - - LALKITHPARIAYELG- FKKAKGDMKAS 1 1 SSLLATI SKFTDEEVSNFTS I SDFHLQD IGRKKI VLYVI 
30 180 190 200 210 220 230 240 

1461 1491 1521 1551 1581 1611 1641 1671 

IPDNDTTFRFLSALFFSWFSTLTRQADVDFEGQLPIHVRSYLDEFANVGEIPDFAEQTSTVRSRNMSLVPILQNIAQLQ 

lllh: Mil :| I : I : HI I MM MM I » I :| I = = I I = III 
3 5 IPVMDNTYESFINLFFSQMFDELYKLASSN-GAKLPQEVDFILDEFVNLGKFPKYEEFL^^ 

260 270 280 290 300 310 320 

1701 1731 1761 1791 1809 1839 1869 1899 

GLYKEKEAWKTILGNCDSLLYLGGNDEETFKFMSGLLGKQTVDVR STSRSFGQTGSSSTSHQKIARDLMTADEVGT 

40 || |] ::|||1 = I "1 I M I III! II I 111 1 = I I M =1 111 11= 

SLY-GKEKAESILGNHAVKICLNASNEATAKYFSELLGKSTVKVETGSESTSHSKETSTSKSDSYSYTSRQLMTPDEIIR 
340 350 360 370 380 390 400 

1929 1956 1974 2004 2034 2064 2094 2124 

45 MKRDECLVRIAG V- PVFRTK KYFPLKHKHWKLLADKETDDRWWNYHINPLAKEEELDLSDYQIRDLSTETSLH**K 



MPDTQSLLIFTNQKPIKATKAFQFKLFPDADSKVl^EQNKYVGITSKSQL 

420 430 440 450 460 470 480 

SEQ ID 5884 (GBS1 Id) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
50 extract is shown in Figure 151 (lane 6; MW 61kDa) and in Figure 182 (lane 10; MW 61kDa). It was also 
expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 12 
(lane 5; MW 91.5kDa). 

Example 1896 

A DNA sequence (GBSx2004) was identified in S.agalactiae <SEQ ID 5885> which encodes the amino 
55 acid sequence <SEQ ID 5886>. Analysis of this protein sequence reveals the following: 

Possible site: 50 

>>> Seems to have no N- terminal signal sequence 

Final Results 

60 bacterial cytoplasm Certainty=0 .4192 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 9935> which encodes amino acid sequence <SEQ ID 9936> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

5 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1897 

A DNA sequence (GBSx2005) was identified in S.agalactiae <SEQ ID 5887> which encodes the amino 
10 acid sequence <SEQ ID 5888>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

»> Seems to have no N-terminal signal sequence 

Final Results 

15 bacterial cytoplasm Certainty=0 . 3391 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

20 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1898 

A DNA sequence (GBSx2006) was identified in S.agalactiae <SEQ ID 5889> which encodes the amino 
25 acid sequence <SEQ ID 5890>. Analysis of this protein sequence reveals the following: 



Possible site: 


: 45 


















>>> Seems to have an uncleavable N-term signal seq 
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INTEGRAL 
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35 Final Results 

bacterial membrane Certainty=0 . 5012 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

40 A related GBS nucleic acid sequence <SEQ ID 9937> which encodes amino acid sequence <SEQ ID 9938> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA11325 GB:D78257 0RF8 [Enterococcus faecalis] 
Identities = 35/102 (34%) , Positives = 57/102 (55%) , Gaps = 4/102 (3%) 

45 

Query: 90 TRNQAVLVQVGKQVPPIIFLLFL-vNASILEEIVYRQLLWEKLTF--PFEQIGVTSFLFV 146 

T N + L+++ V P++ +L L + A I+EEIV+R + L I ++SFLF 

Sbjct: 7 TANDSTLIKLFSGVSPVLVVLLLGIAAPIMEEIVFRGGIIGYLVENNALLAILISSFLFG 66 
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Query: 147 LSHGPNQLGSWLIYSCLGLTIAWRLKT-DCMTAIALHLLWN 187 

+ HGP S+ +Y +G+ L+V KT D +I++H L N 
Sbjct: 67 IIHGPTNFISFGMYFFMGIILSVSYYKTKDLRVSISIHFLNN 108 

No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 893 1> and protein <SEQ ID 8932> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 4 
McG: Discrim Score: 9.32 
GvH: Signal Score (-7.5): -5.41 

Possible site: 45 
>>> Seems to have an uncleavable N-term signal seq 



ALOM program 


count : 6 value : 


-10. 


,03 threshold: 


0.0 
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PERIPHERAL 


Likelihood 


= 1. 


.38 


131 











modified ALOM score: 2.51 



*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 5012 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF01326(568 - 861 of 1188) 

EGAD| 14826l| 158156 (7 - 108 of 120) hypothetical protein {Enterococcus faecalis} 
GP| 1402529|dbj |BAA11325.l| |D78257 0RF8 {Enterococcus faecalis} 
%Match =5.9 

%Identity =34.7 %Similarity = 60.4 

Matches = 35 Mismatches = 37 Conservative Sub.s = 26 

303 333 363 393 423 453 483 513 

Y*L*RFI*EVTMIRIVLFYIAIQLNGLLVSLFLKEYLTIEGI VLLQLVLLSVTCLEIARH 

543 573 603 633 660 690 714 744 

FVAMVAFAVFISFLFPVQTRNQAVLVQVGKQVPPIIFLLFL- VNASILEEIVYRQLLWEKLT- -FP-FEQIGVTSFLFVLS 
| | : |:s, | |:: :|s| : | : | :, | ::|||| : 

MQGHTTTANDSTLIKLFSGVSPVLWLLLGIAAPIMEEIVFRGGIIGYLVENNALLAILISSFLFGII 
10 20 30 40 50 60 

774 804 831 861 891 921 951 981 

HGPNQLGSWLIYSCLGLTLAvTOLKT-DCMTAIALHLLWNSLAYVVTFL*YQNQECFRIMEAPYV**GIEKRGGHYVI*T 

III = 1= :| :|: hi II I :| = = hl I 
HGPTNFISFGMYFFMGIILSVSYYKTKDLRVSISIHFLNNLFPAIAIAYGLI 
80 90 100 110 120 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1899 

A DNA sequence (GBSx2007) was identified in S.agalactiae <SEQ ID 5891> which encodes the amino 
acid sequence <SEQ ID 5892>. Analysis of this protein sequence reveals the following: 

Possible site: 23 
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>» Seems to have no N- terminal signal sequence 

Pinal Results 

bacterial cytoplasm Certainty=0. 2490 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9939> which encodes amino acid sequence <SEQ ID 9940> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could he useful antigens for 
vaccines or diagnostics. 

Example 1900 

A DNA sequence (GBSx2008) was identified in S.agalactiae <SEQ ID 5893> which encodes the amino 
acid sequence <SEQ ID 5894>. Analysis of this protein sequence reveals the following: 

Possible site: 46 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 5298 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC98423 GB:L29323 unknown [Streptococcus pneumoniae] 
Identities = 68/126 (53%), Positives = 88/126 (68%) 

Query: 1 MNLLHKKSILDCTELEERIHQAETNQLLQKILSLPNFDCDFEVTFEDDYHKEMNDPLFYE 60 

M L+K+SILDC ELE +H AE QL ++I +PN+ C+FEVTF DDYHK+ N PLFYE 
Sbjct: 1 MKALNKESILDCDELETELHDAEIKQLDEQIFLMPNYPCEFEVTFLDDYHKKHNYPLFYE 60 

Query: 61 SNLHQISDFMETRDIKNGVDTLLTKDNHLAFRAFGENYSARGKEGILTTLVTVKCFGEGR 120 

S L I +F+E++DIKNG D + +L F +G+ Y A GKEGILTT VTVK F E + 
Sbjct: 61 SYLQNIMEFLESQDIKNGADAFVDDHQNLVFVLYGQGYRAEGKEGILTTQVTVKAFDEDK 120 

Query: 121 MPIDMS 126 
PI+ + 

Sbjct: 121 KPINFA 126 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1901 

A DNA sequence (GBSx2009) was identified in S.agalactiae <SEQ ID 5895> which encodes the amino 
acid sequence <SEQ ID 5896>. This protein is predicted to be methyl transferase. Analysis of this protein 
sequence reveals the following: 

Possible site: 46 

>>> Seems to have no N-terminal signal sequence 
Final Results 
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bacterial cytoplasm 
bacterial membrane 
bacterial outside 



Certainty=0. 1209 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC98421 GB:L29323 methyl transferase [Streptococcus pneumoniae] 
Identities = 323/449 (71%) , Positives = 389/449 (85%) , Gaps = 3/449 (0%) 

Query: 1 MKFLDLFAGIGGFRLGMESQGHKCLGFCEIDKFARTSYKAMFNTEGEIEYHDIKEVTDHD 60 

M+F+DLF+GIGGFRLGMES GH+C+GFCEIDKFAR SYK++F TEGEIE+HDI++V+D + 
Sbjct: 1 MRFIDLFSGIGGFRLGMESVGHECIGFCEIDKFARESYKSIFQTEGEIEFHDIRDVSDDE 60 

Query: 61 FRQFRGQVDIICGGFPCQAFSLAGRRLGFEDTRGTLFFEIARAAKQIQPRFLFLENVKGL 120 

F++ RG+VD+ICGGFPCQAFS+AGRRLGFEDTRGTLFFEIARAAKQIQPRFLFLENVKGL 
Sbjct: 61 FKKLRGKVDVICGGFPCQAFSIAGRRLGFEDTRGTLFFEIARAAKQIQPRFLFLENVKGL 120 

Query: 121 LNHDEGRTFATILSTLDELGYDVEWQVLNSKDFQVPQNRERVFIIGHSRRYRSRFIFPLR 180 

LNHD+GRTF TIL+TLDELG+DVEWQ+LNSKDF VPQNRERVFI IGHSR+ +R FP R 
Sbjct: 121 LNHDKGRTFTTILTTLDELGFDVEWQMIiNSKDFGVPQNRERVFIIGHSRKRGTRLGFPFR 180 

Query: 181 RED SPAHLERLGNINPSKHGLNGEVYLTSGIAPTLTRGKGEGAKIAIPVLTPDRLEK 237 

RE +P L+ LGN+NPSK G++G+VY + GLAPTL RGKGEG KIAIP +TPDRL+K 
Sbjct: 181 REGQATNPETLKILGNLNPSKSGMSGKVYYSEGLAPTLVRGKGEGFKIAIPCMTPDRLDK 240 

Query: 238 RQHGRRFKDNQDPMFTLTSQDKHGVWAGNLPTSFDQTGRVFDISGLSPTLTTMQGGDKV 297 

RQ+GRRFKDNQ+ PMFTL +QD+HG+W G+LPTSF +TGRV+ GLSPTLTTMQGGDK+ 
Sbjct: 241 RQNGRRFKDNQEPMFTLNTQDRHGIVWGDLPTSFKETGRVYGSEGLSPTLTTMQGGDKI 300 

Query: 298 PKILLREELPFLKIKEATKTGYAI<ATLGDSVNLAYPDSTKRRGRVGKGISNTLTTSDNMG 357 

PKIL+ E + FLK++EATK GYA+A +GDS+NL P S RRGRVGKGI +NTLTTS MG 
Sbjct: 301 PKILIPEPIQFLKVREATKKGYAQAEIGDSINLERPSSQHRRGRVGKGIANTLTTSGQMG 360 

Query: 358 VWAALEYRQDKWYEOTGIVLEGKLYRLRIRRLTPRECFRLQGFPDWAYERAESVSSKSQ 417 
VWA+ E + Y+V G++++G+ YRLRIRR+TP+ECFRLQGFPDWA+E A VSS SQ 
.' Sbjct: 361 VWASYEGEDKQvYQVAGVLIDGQFYRLRIRRITPKECFRLQGFPDWAFEAARKVSSNSQ 420 

Query: 418 LYKQAGNSVTVTVIEAIAREFRRTEEEEK 446 

LYKQAGNSVTV VI AIA++ + EE+++ 
Sbjct: 421 LYKQAGNSVTVPVIAAIAKKLKEVEEKDE 449 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2435> which encodes the amino acid 
sequence <SEQ ID 243 6>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

»> Seems to have no N-terminal signal sequence 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 60/75 (80%) , Positives = 69/75 (92%) 

Query: 1 MKFLDLFAGIGGFRLG^SQGHKCLGFCEIDKFARTSYKAMFNTEGEIEYHDIKEVTDHD 60 

MKFLDLFAGIGGFRLG+ +Q H+C+GFCEIDKFAR SYKA++ TEGEIE+HDI ++VTD D 
Sbjct: 4 MKFLDLFAGIGGFRLGLINQCHECIGFCEIDKFARQSYKAIYETEGEIEFHDIRQVTDQD 63 

Query: 61 FRQFRGQVDIICGGF 75 

FRQ RGQVDIICGGF 
Sbjct: 64 FRQLRGQVDI ICGGF 78 



Final Results 



bacterial cytoplasm Certainty=0. 1725 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1902 

A DNA sequence (GBSx2010) was identified in S.agalactiae <SEQ ID 5897> which encodes the amino 
acid sequence <SEQ ID 5898>. Analysis of this protein sequence reveals the following: 

Possible site: 16 
5 >>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -9.71 Transmembrane 8- 24 ( 3- 30) 

Final Results 

bacterial membrane Certainty=0. 4885 (Affirmative) < suco 

10 bacterial outside Certaxnty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9941 > which encodes amino acid sequence <SEQ ID 9942> 
was also identified. 

15 The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5899> which encodes the amino acid 
sequence <SEQ ID 5900>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

>» Seems to have no N-terminal signal sequence 
20 INTEGRAL Likelihood = -1.81 Transmembrane 20 - 36 ( 19 - 36) 

Final Results 

bacterial membrane Certainty=0. 1723 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 16/33 (48%) , Positives = 26/33 (78%) 

30 Query: 1 MNKMIWWILGGIYLISI I ILIVEI IRAPEMDDH 33 

++KM WW+L G++ + I LI+E+I APEM+D+ 
Sbjct: 12 VSKMFWWLLLGVWGLRTIWLI IEVITAPEMEDY 44 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
35 vaccines or diagnostics. 

Example 1903 

A DNA sequence (GBSx2011) was identified in S.agalactiae <SEQ ID 5901> which encodes the amino 
acid sequence <SEQ ID 5902>. This protein is predicted to be ifn-response binding factor 1 (irebf-1). 
Analysis of this protein sequence reveals the following: 

40 Possible site: 53 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4771 (Affirmative) < suco 

45 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD41248 GB:AF106927 unknown [Streptococcus suis] 
50 Identities = 258/272 (94%) , Positives = 266/272 (96%) 

Query: 1 MKRITANQYQTSERYYKLPKILFESERYKDMKLEVKVAYAVLKDRLELSLSKGWIDEDGA 60 
MKRITANQYQTSERYYKLPKILFESERYKDMKLEVKVAYAVLKDRLELSLSKGWIDEDGA 
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Sbjct: 1 MKRITANQYQTSERYYKLPKILFESERYKDMKLEVKVAYAVLKDRLELSLSKGWIDEDGA 60 

Query: 61 IYLIYSNSNLMALLGCSKSKLLSIKKTLREYGLIDEVQQSSSERGRMANKIYLGELEHEP 120 

IYLIYSNSNLMALLGCSKSKLLSIKKTLREYGLIDEVQQSSSE+GRMANKIYLGELEHE 
Sbjct: 61 IYLIYSNSNLMALLGCSKSKLLSIKKTLREYGLIDEVQQSSSEKGRMANKIYLGELEHET 120 

Query: 121 TPVLHTDGASVKKTLGESQRKTGPVLYSAPSETEGSETKYSETEGSDLVMKDEEERQLVD 180 

TPVLHTDGASVKKTLG SQRKTGPVL SAPSETEGSETKYSET+GSD +++DEEERQ VD 
Sbjct: 121 TPVLHTDGASVKKTLGGSQRKTGPVLNSAPSETEGSETKYSETKGSDFLIEDEEERQQVD 180 

Query: 181 EKKEENFTSKVDGVTKYDRDYIWGLVHDQLRQTGLSQSASDYAMIYFSDRYQYALEQMRF 240 

EK+EENFTSKVDGOT+YDRDYIWGLVHDQLRQTGLSQSASDYAMIYFSDRYQYALE MRF 
Sbjct: 181 EKQEENFTSKVDGVTRYDRDYIWGLVHDQLRQTGLSQSASDYAMIYFSDRYQYALEHMRF 240 

15 Query: 241 ARSAEVIAEYVFNGVLSEWTKQLRRQEVKGGE 272 

ARSAEVIAEYVFNGVLSEWTKQLRRQEVKGG+ 
Sbjct: 241 ARSAEVIAEYVFNGVLSEWTKQLRRQEVKGGD 272 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5903> which encodes the amino acid 
20 sequence <SEQ ID 5904>. Analysis of this protein sequence reveals the following: 

Possible site: 55 

»> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm Certainty=0 . 5248 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

30 Identities = 84/122 (68%) , Positives = 99/122 (80%) , Gaps = 2/122 (1%) 

Query: 145 vIiYSAPSETEGSETKYSETEGSDLvMKDEEERQLVD--EKKEENFTSKTOGVTKYDRDYI 202 

VL SAPSETE SET+ SET+ S+LV++DEEER+ +K E +FT +VD VTKYD+DYI 

Sbjct: 1 VLNSAPSETEKSETEGSETKESNLVIEDEEERKECTSVKKTEGHFTRQVDQVTKYDKDYI 60 

35 

Query: 203 WGLVHDQLRQTGLSQSASDYAMIYFSDRYQYALEQMRFARSAEVIAEYVFNGVLSEWTKQ 262 

W LVH QLR+ GLSQ+ASD M YF +RY YALE +RFAR+AE IAEYVFNGVLSEWTKQ 
Sbjct: 61 WSLVHSQLREGGLSQAASDLVMSYFEERYAYALEHIRFARTAEAIAEYVFNGVLSEWTKQ 120 

40 Query: 263 LR 264 

LR 

Sbjct: 121 LR 122 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
45 vaccines or diagnostics. 

Example 1904 

A DNA sequence (GBSx2012) was identified in S.agalactiae <SEQ ID 5905> which encodes the amino 
acid sequence <SEQ ID 5906>. Analysis of this protein sequence reveals the following: 

Possible site: 17 
50 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4191 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

55 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9375> which encodes amino acid sequence <SEQ ID 9376> 
was also identified. 
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Jhe protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1905 

A DNA sequence (GBSx2013) was identified in S.agalactiae <SEQ ID 5907> which encodes the amino 
acid sequence <SEQ ID 5908>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3723 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1906 

A DNA sequence (GBSx2014) was identified in S.agalactiae <SEQ ID 5909> which encodes the amino 
acid sequence <SEQ ID 5910>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3053 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1907 

A DNA sequence (GBSx2015) was identified in S.agalactiae <SEQ ID 591 1> which encodes the amino 
acid sequence <SEQ ID 5912>. This protein is predicted to be 50S ribosomal protein L7/112 (rplL). 
Analysis of this protein sequence reveals the following: 

Possible site: 56 

>>> Seems to have ho N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1034 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 9943> which encodes amino acid sequence <SEQ ID 9944> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

5 >GP:CAB11881 GB:Z99104 ribosomal protein L12 (BL9) [Bacillus subtilis] 

Identities = 83/123 (67%) , Positives = 95/123 (76%) , Gaps = 2/123 (1%) 

Query: 6 MAMIENIIiffilKEATILEIjNDLVKAIEEEFGVTAAAPVAAA- -AAGGEAAAAKDSFDVE 63 
MALNIE IIA +KEAT+LELNDLVKAIEEEFGVTAaAPVA A AA G AA 4- FD+ 
10 Sbjct: 1 MALNIEEIIASVKEATVLEIM5LVKAIEEEPGVTAAAPVAVAGGAAAGGAAEEQSEFDLI 60 

Query: 64 IjTARGDKKVGVIKWREITGEGLKEAKAITONAPSVIKEGASEAEAl^IKEKLEARGASV 123 

L AG 4-K-t- VIKWREITG GIiKEAK +VDN P +KEG +4- EA E+K KLE GASV 
Sbjct: 61 IAGAGSQKIKVIKOTREITGLGLKEAKELVDOTPKPLKEGIAKEEAEELKAKLEEVGASV 120 

Query: 124 TLK 126 
+K 

Sbjct: 121 EVK 123 

20 A related DNA sequence was identified in S.pyogenes <SEQ ID 5913> which encodes the amino acid 
sequence <SEQ ID 5914>. Analysis of this protein sequence reveals the following: 
Possible site: 51 

»> Seems to have no N-terminal signal sequence 

25 Pinal Results 

bacterial cytoplasm Certainty* 0 . 1164 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

30 An alignment of the GAS and GBS proteins is shown below. 

Identities = 104/126 (82%) , Positives = 113/126 (89%) 

Query: 1 MEEIT^mMIENIIAEIKEATILEI^LVKAIEEEFGVTAAAPVAAAAAGGEAAAAKDSF 60 
+EEITMAI^IENIIAEIKEA+ILEI^LVKAIEEEFGVTAAAPVAAAAAGG AAKDSF 
35 Sbjct: 1 LEEITMAI^XENIIAEIKEASILELNDLVKAIEEEFGOTAAAPVAAAftAGGAEEAAKDSF 60 

Query: 61 DVELTAAGDKKVGVIKyVREITGEGLKEAKAITONAPSVIKEGASEAEANEIKEKLEAAG 120 

DVELT+AGDKKVGVIK VREITG GLKEAK +VD AP+ +KEG + AEA SIK KLE AG 
Sbjct: 61 D VELTSAGDKKVGVI KATOE ITGLGLKEAKGLVDGAPANVKEGVAAAEAEEI KAKLEEAG 120 

40 

Query: 121 ASVTLK 126 

A++TLK 
Sbjct: 121 ATITLK 126 

45 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1908 

A DNA sequence (GBSx2017) was identified in S.agalactiae <SEQ ID 5915> which encodes the amino 
acid sequence <SEQ ID 5916>. This protein is predicted to be ribosomal protein L10 (rplJ). Analysis of this 
50 protein sequence reveals the following: 

Possible site: 37 

>» Seems to have no N-terminal signal sequence 

Final Results 

55 bacterial cytoplasm Certainty=0. 1251 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB11880 GB:Z99104 ribosomal protein L10 (BL5) [Bacillus subtilis] 
Identities = 96/164 (58%), Positives = 125/164 (75%), Gaps = 1/164 (0%) 

Query: 14 MSEAIIAKKAEQVELIAEKMKAAASIVWDSRGLTVEQDTNLRRSLRESDVEFKVIKNSI 73 

MS AI KK VE IA K+K + S ++VD RGL V + T LR+ LRE++VE KV KN++ 
Sbjct: 1 MSSAIETKKVV-WEIASKLKESKSTIIVBYRGIjWSEOTELRKQLREANVESKVYKNTM 59 

Query: 74 LTRAAEKAGLEDLKELFVGPSAVAFSNEDVIAPAKVISDFAKDAEALEIKGGSVDGKFTS 133 

RA E+A L L + GP+A+AFS EDV+APAKV++DFAK+ EALEIK G ++GK ++ 
Sbjct: 60 TRRAVEQAEmGLNDFLTGPNAIAFSTEDWAPAKVLNDFAKIfflFJ^EIKAGVIEGKVST 119 

15 Query: 134 VEEINALAKLPNKEGMLSMLLSVLQAPVRNVAYAVKAVAEKDEE 177 

VEE+ ALA+LP +EG+LSMLLSVL+APVRN+A A KAVAE+ EE 
Sbjct: 120 VEEvTCAIAELPPREGLLSMLLSVLKAPvRNLAIAAKAVAEQKEE 163 

A related DNA sequence was identified in S. pyogenes <SEQ ID 5917> which encodes the amino acid 
20 sequence <SEQ ID 591 8>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -5.47 Transmembrane 7 - 23 ( 5 - 24) 

25 Final Results 

bacterial membrane Certainty=0 . 3187 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

30 An alignment of the GAS and GBS proteins is shown below. 

Identities = 149/176 (84%), Positives' = 162/176 (91%) 

Query: 4 SQKI KTEVKLMSEAI IAKKAEQVELIAEKMKAAASI WVDSRGLTVEQDTNLRRSLRESD 63 
S KI KTEVKLMSEAI IAKKAEQVELIAEKMKAAAS I V+VDSRGLTV+QDT LRRSLRES 
35 Sbjct: 23 SPKI KTEVKLMSEAI IAKKAEQVELIAEKMKAAAS IVIVDSRGLTVDQDTVLRRSLRESG 82 

Query: 64 VEFKVI KNS ILTRAAEKAGLEDLKELFVGPSAVAFSNEDVIAPAKVI SDFAKDAEALE I K 123 

VEFKVIKNSILTRAAEKAGL++LK++FVGPSAVAFSNEDVIAPAKVI+DF K A+ALEIK 
Sbjct: 83 VEFKVI KNS ILTRAAEKAGLDELKDVFVGPSAVAFSNEDVI APAKVINDFTKTADALE I K 142 



40 



Query: 124 GGSVDGKFTSVEEINALAKLPNKEGMLSMLLSVLQAPVRNVAYAVKAVAEKDEEVA 179 

GG+++G +S EE I ALA LPN+EGMLSMLLSVLQAPVRNVAYAVKAVAE E A 
Sbjct: 143 GGAIEGAVSSKEEIQALATLPNREGMLSMLLSVLQAPVRNVAYAVKAVAENKEGAA 198 



45 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1909 

A DNA sequence (GBSx2018) was identified in S.agalactiae <SEQ ID 5919> which encodes the amino 
acid sequence <SEQ ID 5920>. Analysis of this protein sequence reveals the following: 

50 Possible site: 40 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -7.22 Transmembrane 125 - 141 ( 106 - 143) 
INTEGRAL Likelihood = -1.91 Transmembrane 108 - 124 ( 106 - 124) 

55 Final Results 

bacterial membrane Certainty=0 .3888 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



WO 02/34771 



PCT/GB01/04789 



-2153- 

A related GBS nucleic acid sequence <SEQ ID 10931> which encodes amino acid sequence <SEQ ID 
10932> was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1910 

A DNA sequence (GBSx2019) was identified in S.agalactiae <SEQ ID 5921> which encodes the amino 
acid sequence <SEQ ID 5922>. This protein is predicted to be Clp-like ATP-dependent protease binding 
subunit (clpC). Analysis of this protein sequence reveals the following: 

Possible site: 31 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3483 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA68910 GB:L34677 Clp-like ATP-dependent protease binding 
subunit [Bos taurus] 
Identities = 437/589 (74%) , Positives = 514/589 (87%) , Gaps = 5/589 (0%) 



Query: 


10 


DPFGN-MDDIFNSLMGNMGGYNSENKRYLINGREVTPEEFSQYRQTGKLPGQEI.NNQNTP 


68 






DPF N MDD+FN LMG M G NSEN+RYLINGREVTPEE++ +RQTGKLPG Q 




Sb j ct : 


2 


DPFNNDMDDLFNQLMGGMNGVNSENRRYLINGREVTPEEYAAFRQTGKLPGVTDPTQ-AK 


60 


Query: 


69 


TOQVSADSVLTKIiGTNLTDQARQHLLDPVIGRNKEIQETAEIIARRTKNNPVLVGDAGVG 


128 






T Q DS+L KLG NLT +A++ LD P VI GRNKE I QETAE I L+RRTKNNPVLVGDAGVG 




Sb j ct : 


61 


TKQPQPDSMLAKLGRNLTQEAKEGKLDPVIGRNKEIQETAEILSRRTKNNPVLVGDAGVG 


120 


Query: 


129 


KTAVIEGLAQAIINGDVPAAIKNKEIISIDISSLEAGTQYRGSFEENIQNIIKEVKETGN 


188 






KTAV+EGLAQAI+ GDVPAAIKNK+ 1 1 SIDI SSLEAGTQYRGSFEEN+Q +1 EVK+ GN 




Sb j ct : 


121 


KTAVVEGLAQAIVAGDVPAAIKNKQIISIDISSLEAGTQYRGSFEENMQKLIDEVKKDGN 


180 


Query: 


189 


IILFFDEIHQILGAGSTGGDSGSKGLADILKPALSRGELTVIGATTQDEYRNTILKNAAL 


248 






+ILFFDEIHQI+GAG+ G SGSKG+ADILKPALSRGE+T+IGATTQDEYRNTILK+AAL 




Sb j ct : 


181 


VILFFDEIHQIIGAGNAGDASGSKGMADILKPALSRGEVTLIGATTQDEYRNTILKDAAL 


240 


Query: 


249 


ARRFNEVKVNAPSAQDTFNILMGIRNLYEQHHNWLPDSVLKAAVDLSIQYIPQRSLPDK 


308 






+RRFN+V VNAPS +DTF IL G+R LYE+HHNV LPD VLKAA+D S+QYIPQRSLPDK 




Sb j ct : 


241 


SRRFNQVTVNAPSKEDTFKILQGLRKLYEKHHNVSLPDEVLKAAIDYSVQYIPQRSLPDK 


300 


Query: 


309 


AIDLIDMTAAHIiAAQHPVTDLKSLEKEIAEQRDKQEKAVNTEDFEEALJCVKTRIEELQNQ 


368 






AIDLID+TAAHLA++HPV D KH-+E+EI + KQ++AV ED++ A + K ++ +LQ+Q 




Sb j ct : 


301 


AIDLIDVTAAHIA.SKHPVKDAKTIEEEIKKTEAKQQEAVEKEDYQAAQEAKDQVAKLQDQ 


360 


Query: 


369 


IDNHTEGQKVTATINDIAMSIERLTGVPVSNMGASDIERLKELGNRLKGKVIGQNDAVEA 


428 






+ +H+E ++V AT +D+A. ++ER+TG+PVS MGASDIERLK h RL+GKVIGQ +AVEA 




Sb j ct : 


361 


LKDHSESERWATPSDVAAAVERMTGIPVSKMGASDIERLKGLATRLEGKVIGQQEAVEA 


420 


Query: 


429 


VARAIRRNRAGFDDGNRPIGSFLFVGPTGVGKTEIAKQLAFDMFGSKDAIVRLDMSEYND 


488 






V+RAIRRNRAGFD+GNRPIGSFLFVGPTGVGKTELAKQLA DMFGS + I+RLDMSEY D 




Sb j ct : 


421 


VSRAIRRNRAGFDEGNRPIGSFLFVGPTGVGKTEIiftKQIiALDMFGSTNDIIRLDMSEYTD 


480 


Query: 


489 


RTAVSKLIGATAGYVGYDDNSNTLTERIRRNPYSIVLLDEIEKADPQVITLLLQVLDDGR 


548 



RTAVSKLIG TAGYVGYDDNSNTLTE++RR+PYSIVLLDEIEKA+PQVITLLLQVLDDGR 
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Sbjct: 481 RTAVSKLIGTTAGWGYDDNSNTLTEKVRRHPYSIVLLDEIEKANPQVITLLLQVLDDGR 540 

Query: 549 LTDGQGNTINFKNTVIIATSNAGFGNEAFTGDSDKDLKIMERISPYFRP 597 

LTDGQGNT+ + FKNT+ 1 1 ATSNAGF ++A G+ D K+M+++ PYFRP 
Sbjct: 541 LTDGQGNTVDFKNTIIIATSNAGFSSDAVAGE DAKLMDKLQPYFRP 586 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5923> which encodes the amino 
sequence <SEQ ID 5924>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2718 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 551/697 (79%), Positives = 616/697 (88%), Gaps = 3/697 (0%) 



Query: 


5 


NFYNRDPFGNMDDIFNSLMGNMGGYNSENKRYLINGREVTPEEFSQYRQTGKLPGQELNN 


64 






+F +DPF NMDDIFN LM NMGGY SEN RYL+NGRE+TPEEF YRQTG+LP 




QU-; ,-,4- . 


J 


tlC ov3iVLJ.tr r V IN rlXJJJ X r IN yijriHIN 1 v 1VjV3 1 rCO EjIN iri\. x xj V i\ vartu x x Jr EjHiT yn x Jvv; J- W 11 ~ v.rt.x x r\j-s. 


62 


Query: 


65 


QNTPTNQVSADSVLTKLGTNLTDQARQHLLDPVIGRNKEIQETAEILARRTKNNPVLVGD 


124 






N+ ADSVLT+LGTNLT +ARQ LDP VTGRNKE I Q+TAE I LARRTKNNP VLVGD 




oU J LL ■ 


63 


TNSQMLTPKADSVLTQLGTNLTQEARQGHLDPVIGRNKEIQDTAEILARRTKNNPVLVGD 


122 


Query: 


125 


AGVGKTAVIEG1AQAIINGDVPAAIKNKEIISIDISSLEAGTQYRGSFEENIQNIIKEVK 


184 






AGVGKTAVIEGLAQAI+NGDVPAAIKNKEI+SIDISSIjEAGTQYRGSFEE iqn+i+evk 




Sb j ct : 


123 


AGVGKTAVIEGLAQAIVNGDVPAAIKNKEIVSIDISSLEAGTQYRGSFEETIQNLIQEVK 


182 


Query: 


185 


ETGNIILFFDEIHQILGAGSTGGDSGSRGLADILKPALSRGELTVIGATTQDEYRNTILK 


244 






E GNIILFFDEIHQI+GAG+T DSGSKGLADILKPALSRGELT+IGATTQDEYRNTILK 




Sb j ct : 


183 


EAGNIILFFDEIHQIVGAGATSSDSGSKGLADILKPALSRGELTLIGATTQDEYRNTILK 


242 


Query: 


245 


NAA1ARRFNEVKVNAPSAQDTFNILMGIRNLYEQHHNVVLPDSVLKAAVDLSIQYIPQRS 


304 






NAALARRFNEVKVNAPSA+DTF+ILMGIRNLYEQHH++ LPD+VLKAAVD SIQYIPQRS 




Sbjct: 


243 


NAALARRFNEVKVNAPSAEDTFHILMGIRNLYEQHHHITLPDNVLKAA.VDYSIQYIPQRS 


302 


Query: 


305 


LPDKAIDLIDMTAAHLAAQHPVTDI.KSLEKEIAEQRDKQEKAVNTEDFEEALKVKTRIEE 


364 






LPDKAIDL+DMTARHLAAQHPvTDLK+LE EIA+Q++ QEKAV EDFE+AL KTRIE 




Sb j ct : 


303 


LPDKAIDLLDMTARHLARQHPVTDLKTLETEIAKQKESQEKAVAKEDFEKAIAAKTRIET 


362 


Query: 


365 


LQNQIDNHTEGQKVTATINDIAMSIERLTGVPVSNMGASDIERLKELGNRLKGKVIGQND 


424 






LQ QI+ H + Q VTAT+NDIA S+ERLTG+PVSNMG +D+ERLK + +RLK VIGQ++ 




Sb j ct : 


363 


LQKQIEQHNQSQNOTATVNDIAESVERLTGIPVSNMGTNDLERLKGISSRLKSHVIGQDE 


422 


Query: 


425 


AVEAVARAIRRNRAGFDDGNRPIGSFLFVGPTGVGKTELAKQLAFDMFGSKDAIVRLDMS 


484 






AV AVARAIRRNRAGFDDG RPIGSFLFVGPTGVGKTELAKQLA D+FGSKDAI+RLDMS 




Sb j ct : 


423 


AVAAVARAIRRNRAGFDDGKRPIGSFLFVGPTGVGKTEIAKQLALDLFGSKDAIIRLDMS 


482 


Query: 


485 


EYNDRTAVSKLIGATAGYVGYDDNSNTLTERIRRNPYSIVLLDEIEKADPQVITLLLQVL 


544 






EYNDRTAVSKLIG TAGYVGYDDN+NTLTER+RRNPY+IVLLDEIEKADPQ+ITLLLQVL 




Sbjct: 


483 


EYNDRTAVSKLIGTTAGYVGYDDNNNTLTERVRRNPYAIVLLDEIEKADPQIITLLLQVL 


542 


Query: 


545 


DDGRLTDGQGNTINFKNTVIIATSNAGFGNEAFTGDSDKDLKIMERISPYFRPEFLNRFN 


604 






DDGRLTDGQGNTINFKNTVT IATSNAGFG + + IM+RI+PYFRPEFLNRFN 




Sb j ct : 


543 


DDGRLTDGQGNTINFKNTVI IATSNAGFGQQ DTETSESNIMDRIAPYFRPEFLNRFN 


599 


Query: 


605 


GVIEFSHLSKDDLSEIVDLMLDEVNQTIGKKGIDLWDENVKSHLIELGYDEAMGVRPLR 


664 






+I+F+HL K+ L EIVDLML EVNQT KKGI h + ++ K+HLI+LGY+ AMG RPLR 




Sb j ct : 


600 


S 1 1 KFNHLQKESLEE I VDLMLAEVNQTTAKKGI SLTITDDAKAHLIDLGYNHAMGARPLR 


659 


Query: 


665 


RVIEQEIRDRITDYYLDHTDVKHLKANLQDGQIVISE 701 





R+IEQEIRDRITDYYLDH +VK L+A L++GQ+VI + 
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Sbjct: 660 RIIEQEIRDRITDYYLDHPEVKKLQAILKEGQLVIRQ 696 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

5 Example 1911 

A DNA sequence (GBSx2020) was identified in S.agalactiae <SEQ ID 5925> which encodes the amino 
acid sequence <SEQ ID 5926>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

»> Seems to have an uncleavable N-term signal seq 
10 INTEGRAL Likelihood = -4.78 Transmembrane 8 - 24 ( 7 - 25) 

Final Results 

bacterial membrane Certainty=0 .2911 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9945> which encodes amino acid sequence <SEQ ID 9946> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

20 >GP:AAC73364 GB:AE000134 putative enzyme [Escherichia coli K12] 

Identities = 142/307 (46%) , Positives = 195/307 (63%) , Gaps = 6/307 (1%) 

Query: 39 KELLESKKTLILHGALGTELESRGCDVSGKLWSAKYLIEDPAAIQTIHEDYIRAGADIVT 98 
+ LL+ + L+L GA+ TELE+RGC+++ LWSAK L+E+P 1+ +H DY RAGA 
25 Sbjct: 8 RALLDKQDILLl^GBiMATEIiEARGCNIiADSLWSAKVLVENPELIREVHLDYYRAGAQCAI 67 

Query: 99 TSTYQATLQGLAQVGVSESQTEDLIRLTVQLAKAAREQVWKSLTKEEKSERIYPLISGDV 158 

T++YQAT G A G+ E+Q++ LI +V+LA+ ARE L + ++ + L++G V 

Sbjct: 68 TASYQATPAGFAARGLDEAQSKALIGKS VELARKAREAY LAENPQAGTL - - LVAGSV 122 

30 

Query: 159 GPYAAFLADGSEYTGLYDIDKQGLKNFHRHRIELLLDEGVDILALETIPNAQEAEALIEL 218 

GPY A+LADGSEY G Y + + FHR R+E LLD G D+LA ET+PN E EAL EL 
Sbjct: 123 GPYGAYLADGSEYRGDYHCSVEAFQAFHRPRVEALLDAGADLLACETLPNFSEIEALAEL 182 

35 Query: 219 LAEDFPQVEAYMSFTSQDGKTISDGSAVADLAKAIDVSPQWALGINCSSPSLVADFLQA 278 

L +P+ A+ SFT +D + +SDG+ + D+ + PQWALGINC + LQ 
Sbjct: 183 LTA- YPRARAWFS FTLRDSEHLSDGTPLRDWALLAGYPQWALG INC IALENTTAALQH 241 

Query: 279 IAEQTNKPLVTYPNSGEVYDGASQSWQSSPDHSHTLLENTSDWQKLGAQWGGCCRTRPA 338 
40 + T PLV YPNSGE YD S++W +H L + WQ GA+ + +GGCCRT PA 

Sbjct: 242 LHGLTVLPLVVYPNSGEHYDAVSKTWHHHGEHCAQLADYLPQWQAAGARLIGGCCRTTPA 301 

Query: 339 DIADLSA 345 
DIA L A 

45 Sbjct: 302 DIAALKA 308 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

50 A related GBS gene <SEQ ID 8933> and protein <SEQ ID 8934> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 5 
McG: Discrim Score: 5.48 
GvH: Signal Score (-7.5): -2.64 
55 Possible site: 20 
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»> Seems to have an uncleavable N-term signal seq 
ALOM program count: 1 value: -4.78 threshold: 0.0 

INTEGRAL Likelihood = -4.78 Transmembrane 8 - 24 ( 7 - 25) 
PERIPHERAL Likelihood = 2.49 259 
5 modified ALOM score: 1.46 

*** Reasoning Step: 3 

Pinal Results 

10 bacterial membrane Certainty=0 .2911 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

15 ORF01312(412 - 1338 of 1644) 

OMNI |NT01EC0303 (SS - 357 of 358) conserved hypothetical protein 
%Match =23.8 

%Identity =46.6 %Similarity = 64.3 

Matches = 142 Mismatches = 107 Conservative Sub.s = 54 



20 



35 



288 318 348 378 408 438 468 498 

LISQSFCS*FRL*6LLGIAHNVLGFTSVFHLLFSAIFITNYVTRNGDLMGRFKELLESKKTLILHGALGTELESRGCDVS 



AWWPVLGViMSIQRRELRCGAGYRLLRCaMvLISLLNPETQNRSQNMSQ 
25 20 30 40 50 60 70 80 

528 558 588 618 648 678 708 738 

GKLWSAiaLIEDPAAIQTIHEDYIRAGADIVTTSTYQATU2^^ 

inn hi = i h =i ii mi mini m u nm n mm in i 

3 0 DSLWSAKVLVENPELIREVHLDYYFAGAQCAITASYQATPAGFAARGLD^ AYLAEN 

100 110 120 130 140 150 



768 798 828 858 888 918 948 978 

SERIYPLISGDVGPYAAFIADGSEYTGLYDIDKQGLKNFHRHRIELLLDEGVDIIALETIPNAQEAEALIELLAEDFPQV 

: mi mi miiini i i = == in m m 1 mi mil i in m m 

PCAGTLLVAGSVGPYGAYLADGSEYRGDYHCSVEAFQAFHRPRVEALLDAGADLLACETLPNFSEIEALAELLT-AYPRA 
170 180 190 200 210 220 230 



1008 1038 1068 1098 1128 1158 1188 1218 

40 FAYMSFTSQDGKTISDGSAVADIAKAIDVSPQWALGINCSSPSLVADFLQAIAEQTNKPLVTYPNSGEVYDGASQSWQS 

|: HI =] = =111= = 1= = 1111111111 ' 11 : I 111 llllll II 1-1 = 

RAWFSFTLRDSEHLSDGTPLRDWAL^GYPQWALGINCIALEOT^ 

250 260 270 280 290 300 310 

45 1248 1278 1308 1338 1368 1398 1428 1458 

SPDHSHTLLENTSDWQKLGAQWGGCCRTRPADIADLSAHLK*VKYLEEG*GKFDFLFQSTRKPAWILPNGFCFYLSEMT 

• :| 1 : I! 11 = = = 11 I II I I I I I I I I 
HGEHCAQIADYLPQWQAAGARLIGGCCRTTPADIAALKARS 
330 340 350 

50 SEQ ID 8934 (GBS381) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 68 (lane 6; MW 42kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 72 (lane 4; MW 66.9kDa). 

Example 1912 

A DNA sequence (GBSx2021) was identified in S.agalactiae <SEQ ID 5927> which encodes the amino 
55 acid sequence <SEQ ID 5928>. Analysis of this protein sequence reveals the following: 

Possible site: 51 

»> Seems to have no N-terminal signal sequence 

Final Results 

60 bacterial cytoplasm --- Certainty=0. 2996 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 
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bacterial outside Certain.ty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
5 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1913 

A DNA sequence (GBSx2022) was identified in S.agalactiae <SEQ ID 5929> which encodes the amino 
acid sequence <SEQ ID 5930>. Analysis of this protein sequence reveals the following: 

10 Possible site: 59 

>>> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial membrane --- Certainty=0. 5649 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9377> which encodes amino acid sequence <SEQ ID 9378> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12034 GB.-Z99105 similar to histidine permease [Bacillus subtilis] 
30 Identities = 221/384 (57%), Positives = 291/384 (75%), Gaps = 2/384 (0%) 

Query: 2 PWGSFHTYATKFISPGTGFTVAWLYWICWTVALGTEFLGAAMLMQRWFPNVPAWAFASF 61 

PVTG+FHTYA K+I PGTGFTVAWLYW+ WTVALG+EF A +LMQRWFP+ W +++ 
Sbjct: 76 PWGAFHTYAAKyiGPGTGFTVAVtt.yWLTWWALGSEFTAAGLLMQRWFPHTSVW^SAV 135 

35 

Query: 62 FALVI FGLNALS VRFFAEAESFFSS I KVI AI 1 1 F 1 1 LGLGAMFGLVS FEGQHKAILFTHL 121 

FAL IF LNA SV+FFAE+E +FSS I KV+AI ++FI +LG AMFG++ +G A + ++ 
Sbjct: 136 FALFIFLI.NAFSWFAESEFWFSSIKVLAIVLFILLGGSAMFGIIPIKGGEAAPMLSNF 195 

40 Query: 122 TANGA-FPNGIVAWSVMLAvNYAFSGTELIGIAAGETDNPKEAVPRAIKTTIGRLVVFF 180 

TA G FPNG V ++ ML+VN+AFSGTELIGIAAGE+ +P + +P+AIKTT+ RL +FF 
Sbjct: 196 TAEGGLFPNGFVPILMTMLSVNFAFSGTELIGIAAGESVDPDKTIPKAIKTTVWRLSLFF 255 

Query: 181 VLTIVVIASLLPMKEAGVSTAPFTOVFDKMGIPFTADIMNFVILTAILSAGNSGLyASSR 240 
45 V TI VL+ L+P+++AGV +PFV VFD++G+P+ ADIMNFVILTAILSA NSGLYASSR 

Sbjct: 256 VGTIFvLSGLIPIQDAGVIKSPFVAWDRVGVPYAADIMnfVILTAILSAANSGLYASSR 315 

Query: 241 MLWSLANEGMLSKSVVKINKHGVPMRALLLSMAGAVLSLFSSIYAADTVYLALVSIAGFA 300 
MLWSL+ E L + K+ G P AL+ SM G +LSL SS++A DTVY+ LVSI+GFA 
50 Sbjct: 316 MLWSLSKEKTLHPTFAKLTSKGTPFNALVFSMIGGILSLLSSVFAPDTVYVVLVSISGFA 375 

Query: 301 VVWWLAIPVAQINFRKEFLKE-NQLEDLSYKTPFTPVLPYITIILLLISIVGIAWDSSQ 359 

WWW+ I +Q FRK +++ N++ DL Y+TP P +P +L h S+VGIA+D +Q 
Sbjct: 376 VVVVWMGIAASQFMFRKRYIEAGNKVTDLKYRTPLYPFVPIAAFLLCLASWGIAFDPNQ 435 



Query: 360 RAGLYFGVPFIIFCYIYHKLRYKK 383 

R LY GVPF+ CY + ++ +K 
Sbjct: 436 RIALYCGVPFMAICYAIYYVKNRK 459 
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There is also homology to SEQ ID 4070. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1914 

A DNA sequence (GBSx2023) was identified in S.agalactiae <SEQ ID 593 1> which encodes the amino 
acid sequence <SEQ ID 5932>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

>>> Seems to have no N-tertninal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2378 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

There is also homology to SEQ ID 5642. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1915 

A DNA sequence (GBSx2024) was identified in S.agalactiae <SEQ ID 5933> which encodes the amino 
acid sequence <SEQ ID 5934>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4935 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1916 

A DNA sequence (GBSx2025) was identified in S.agalactiae <SEQ ID 5935> which encodes the amino 
acid sequence <SEQ ID 5936>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0530 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1917 

A DNA sequence (GBSx2026) was identified in S.agalactiae <SEQ ID 5937> which encodes the amino 
5 acid sequence <SEQ ID 5938>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

>>> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm — Certainty=0 . 0175 (Affirmative) < suco 

bacterial membrane --- Certaxnty=0 .0000 (Not Clear) < suco 

bacterial outside — - Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

15 >GP:AAF63739 GB-.AF236863 hypothetical GTP-binding protein 

[Lactococcus lactis] 
Identities = 142/193 (73%) , Positives = 165/193 (84%) 

Query: 6 IOTHNASILLSAANKSHYPQDDLPEVAIAGRSNVGKSSFINTLLGRKNLARTSSKPGKTQ 65 
20 +NT+N +1 +SAA+K YP++D PE+AIAGRSNVGKSSFINTLL RKN ARTS +PGKTQ 

Sbjct: 3 INTNNLTITISAASKKQYPENDWPEIAIAGRSNVGKSSFINTLIiNRKNFARTSGQPGKTQ 62 

Query: 66 LI^FYNIDDKLRFVDVPGYGYAKVSKTERAKWGKMIEEYLVTRDNLRVVVSLVDFRHDPS 125 
LLNFYNIDD+L FVDVPGYGYA+VSK ER KWGKMIEEYL TR+NL+ WSLVD RH+PS 
25 Sbjct: 63 LIMFYNIDDQLHFVDVPGYGYARVSKKEREKWGKMIEEYLTTRENLKAWSLVDIRHEPS 122 

Query: 126 ADDIQMYEFLKYYEIPVIIVATKADKIPRGKJTOKHESSIKKKLNFDKKDHFIVFSSVDRT 185 

DD+ MYEFLKYY I PVI +VATKADK+PRGKWNKHES IKK + FD . D FI+FSS D+T 
Sbjct: 123 EDDIJflMYEFLKYYHIPVILVATKRDIOTPRGKSJNKHESIIKKAMKFDSTDDFIIFSSTDKT 182 



30 



Query: 186 GLDESWDTILSEL 198 

G++E+W IL L 
Sbjct: 183 GIEEAWTAILKYL 195 



35 A related DNA sequence was identified in S.pyogenes <SEQ ID 5939> which encodes the amino acid 
sequence <SEQ ID 5940>. Analysis of this protein sequence reveals the following: 
Possible site: 18 

>>> Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm --- Certainty=0 . 0123 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 An alignment of the GAS and GBS proteins is shown below. 

Identities = 167/196 (85%) , Positives = 183/196 (93%) 

Query: 3 EEFLNTHNASILLSAANKSHYPQDDLPEVALAGRSNVGKSSFINTLLGRKNLARTSSKPG 62 
E+ LOTHNASILLSAANKSHYPQDDLPE+ALAGRSWGKSSFINT+LGRKNLARTSSKPG 
50 Sbjct: 4 EQVLNTHNASILLSAANKSHYPQDDLPEIALAGRSNVGKSSFINTILGRKNLARTSSKPG 63 

Query: 63 KTQLI^FYNIDDKLRFVDVPGYGYAKVSKTERAKWGKMIEEYLVTRDNLRVVVSLvIIFRH 122 

KTQLLNF+NIDDKLRFVDVPGYGYAKVSK+ERAKWGKMIEEYL +RDNLR WSLVD RH 
Sbjct: 64 KTQLIiNFFNIDDKLRFVnVPGYGYAKVSKSFjRAKWGKMIEEYLTSRDNLRAVVSLVDLRH 123 

55 

Query: 123 DPSADDIQMYEFLKYYEIPVIIVATKADKIPRGKWNKHESSIKKKLNFDKKDHFIVFSSV 182 

PS +DIQMY+FLKYY+ 1 PVI +VATKADKIPRGKWNKHES +KK LNFDK D FIVFSSV 
Sbjct: 124 APSKEDIQMYDFLKYYDIPVIWATKADKIPRGKWNKHESVVKKALNFDKSDTFIVFSSV 183 
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Query: 183 DRTGLDESWDTILSEL 198 

+R G+D+SWD IL ++ 
Sbjct: 184 ERIGIDDSWDAILEQV 199 

5 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1918 

A DNA sequence (GBSx2027) was identified in S.agalactiae <SEQ ID 5941> which encodes the amino 
acid sequence <SEQ ID 5942>. This protein is predicted to be protease ClpX (clpX). Analysis of this 
10 protein sequence reveals the following: 

Possible site: 50 

>>> Seems to have no N-terminal signal sequence 

Final Results 

15 bacterial cytoplasm Certainty=0 .2389 (Affirmative) < suco , 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9947> which encodes amino acid sequence <SEQ ID 9948> 
20 was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF63738 GB:AF236863 protease ClpX [Lactococcus lactis] 
Identities = 305/395 (77%), Positives = 357/395 (90%), Gaps = 1/395 (0%) 

25 Query: 18 NVYCSFCGKSQDEVKKI IAGNGVFI CNECVALSQEI I KEELAEEVLADLAEVPKPKELLE 77 

N+ CSFCGKSQD+VKK+ IAG+ V+ICNEC+ LS I++EEL EE +++ EV PKE+ + 
Sbjct: 8 NIQCSFCGKSQDDVKKMIAGSDVYICNECIELSTRILEEELKEEQDSEMLEVKTPKEMFD 67 

Query: 78 ILNQYWGQDRAKRALAVAVYNHYKRVSYTESS-DDDVDLQKSNILMIGPTGSGKTFLAQ 136 
30 LN+ YV+GQ+ +AKRALAVAVYNHYKR+ ++T S +D+ +LQKSNI L+ IGPTGSGKTFLAQ 

, Sbjct: 68 HLNEWIGQEKAKRAIAVAVYNHYKRINFTASKIAEDIELQKSNILLIGPTGSGKTFLAQ 127 

Query: 137 TLAKSLNVPFAIADATSLTFAGWGEDTONILLKLIQAADYNVERAERGIIYVDEIDKIA 196 
TLAKSLNVPFAIADATSLTEAGYVGEDVENILLKL+QA+D+N+ERAERGIIY+DEIDKIA 
35 Sbjct: 128 TLAKSLNVPFAIADATSLTEAGYVGEDVENILLKLLQASDFNIERAERGIIYIDEIDKIA 187 

Query: 197 KKGENVSITRDVSGEGVQQALLKIIEGTVASVPPQGGRKHPNQEMIQINTKNILFIVGGA 256 

KK ENVSITRDVSGEGVQQALLKIIEGTVASVPPQGGRKHPNQEMIQI+TKNILFIVGGA 
Sbjct: 188 KKSENVSITRDVSGEGVQQALLKIIEGTVASVPPQGGRKHPNQEMIQIDTKNILFIVGGA 247 

40 

Query: 257 FDGIEDLVKQRLGEKVIGFGQTSRKIDDNAS YMQE 1 1 SEDIQKFGLI PEFIGRLPWAAL 316 

FDGIE++VKQRLGEK+IGFG ++K+ D SYMQEI I+EDIQKFGLIPEFIGRLP+VAAL 
Sbjct: 248 FDGIEEIVKQRLGEKIIGFGANNKKLSDEDSYMQEIIAEDIQKFGLIPEFIGRLPIVAAL 307 

45 Query: 317 ELLTAEDLVRILTEPRNALVKQYQTLLSYDGVELEFDQDALLAIADKAIERKTGARGLRS 376 

E LT EDL++ILTEP+NAL+KQY+ hh +D VELEF AL+AIA KAIERKTGARGLRS 
Sbjct: 308 ERLTEEDLIQILTEPKNALIKQYKQLLLFDNVELEFKDGALMAIAKKAIERKTGARGLRS 367 

Query: 377 IIEETMLDIMFEIPSQEDVTKVRITKAAVEGTDKP 411 
50 IIEE M+DIMFE+PS E++TKV IT+A V+G +P 

Sbjct: 368 I IEE VMMDIMFEVPSHEEITKVI ITEAWDGKAEP 402 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5943> which encodes the amino acid 
sequence <SEQ ID 5944>. Analysis of this protein sequence reveals the following: 

55 Possible site: 42 

>>> Seems to have no N-terminal signal sequence 



Final Results 
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bacterial' cytoplasm Certainty=0 .2711 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 378/409 (92%) , Positives = 393/409 (95%) , Gaps = 1/409 (0%) 



Query: 


9 


MAGNRNNDMNVYCS FCGKSQDEVKKI IAGNGVFICNECVALSQEI IKEELAEEVLADLAE 


68 






MAG+R ND+ VYCSFCGKSQD+VKKIIAGN VFICNECVALSQEIIKEELAEEVLADL E 




Ql-t-i ,-,4- . 


T 
J- 


MTxncjRTKmTTnr/pcjFr'nTfcinnnvK'K'T t A(^NT\TVFTr7\TT?(^AT.^nETTK'RFTiAVRW 1 AnTiTE 


60 


Query: 


69 


VPKPKELLEILNQYWGQDRAKRALAVAVYNHYKRVSYTES - SDDDVDLQKSNILMIGPT 


127 






VPKPKELL++LNQYWGQDRAKRAL+VAVYNHYKRVS+TES DDDVDLQKSNILMIGPT 




OJJJ C-U . 


0 X 




120 


Query: 


128 


GSGKTFl^QTLAKSLNVPFAIADATSLTEAGYVGEDVENILLKLIQAADYNVERAERGII 


187 






GSGKTFI^QTI^KSIjNVPFAIADATSLTEAGYVGEDvENILLKLIQAADYNVERAERGII 










180 


Query: 


188 


YVDE IDKI AKKGENVS I TRDVSGEGVOOALLKI I EGTVASVPPOGGRKHPNOEMI OINTK 


247 






YVDEIDKIAKKGENVSITRDVSGEGVQQALLKIIEGTVASVPPQGGRKHPNQEMIQI+TK 




Sb j ct : 


181 


YVDEIDKIAKKGENVSITRDVSGEGVQQALLKIIEGTVASVPPQGGRKHPNQEMIQIDTK 


240 


Query: 


248 


NILFIVGGAFDGIEDLVKQRLGEKVIGFGQTSRKIDDNASYMQEIISEDIQKFGLIPEFI 


307 






NILFIVGGAFDGIE++VKQRLGEKVIGFGQ SRKIDDNASYMQEIISEDIQKFGLIPEFI 




Sbjct: 


241 


NILFIVGGAFDGIEEIVKQRLGEKVIGFGQNSRKIDDNASYMQEIISEDIQKFGLIPEFI 


300 


Query: 


308 


GRLPWAALELLTAEDLWILTEPRNALVKQYQTIiLSYDGVELEFDQDALLAIADKAIER 


367 






GRLPWAALE L DL++ILTEPRNALVKQYQ LLSYDGVEL FD++AL AIA+KAIER 




Sbjct: 


301 


GRLPWAAl^QLOTSDLIQILTEPRNRLVKQYQALLSYDGVELAFDKEALEAIANKAIER 


360 


Query: 


36B 


KTGARGLRSIIEETMLDIMFEIPSQEDVTKvRITKAAvEGTDKPVLETA 416 








KTGARGLRSIIEETMLDIMFEIPSQEDVTKvRITKAAVEG KPVLETA 




Sb j ct : 


361 


KTGARGLRSIIEETMLDIMFEIPSQEDVTKVRITKAAVEGKSKPVLETA 409 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1919 

A DNA sequence (GBSx2028) was identified in S.agalactiae <SEQ ID 5945> which encodes the amino 
acid sequence <SEQ ID 5946>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

»> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1920 

A DNA sequence (GBSx2029) was identified in S.agalactiae <SEQ ID 5947> which encodes the amino 
acid sequence <SEQ ID 5948>. Analysis of this protein sequence reveals the following: 
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Possible site: 36 

>» Seems to have no N-terminal signal sequence 



-' Final Results 

5 bacterial cytoplasm Certainty=0. 4029 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9949> which encodes amino acid sequence <SEQ ID 9950> 
10 was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:A&C33872 GB:AF055727 dihydrof olate reductase [Streptococcus pneumoniae] 
Identities = 83/162 (51%) , Positives = 118/162 (72%) , Gaps = 1/162 (0%) 

15 Query: 25 MTKQIIAIWAEDEDHLIGvNGGLPWRLPKELHHFKETTMGQALLMGRKTFDGMNRRVLPG 84 

MTK+I+AIWA+DE+ LIG LPW LP EL HFKETT+ A+LMGR TFDGM RR+LP 
Sbjct: 1 MTKKIVAIWAQDEEGLIGKENRLPWHLPAELQHFKETTLNHAILMGRVTFDGMGRRLLPK 60 

Query: 85 RETIlLTKDEQFQADGVTVLNSVEQVIKWFQEHNKTLFIVGGASIYKAFLPYCEAIIICrK 144 
20 RET+ILT++ + + DGV V+ V+ W+Q+ K L+I+GG I++AF PY + +1 T 

Sbjct: 61 RETLILTRNPEEKIDGVATFQDVQSVLDWYQDQEKNLYIIGGKQIFQAFEPYLDEVIVTH 120 

Query: 145 VHGKFKGDTYFP-DvNLSEFKVISRDYFEKDEQNAHAFTVTY 185 
+H + +GDTYFP +++LS F+ +S ++ KDE+N + FT+ Y 
25 Sbjct: 121 IHARVEGDTYFPEELDLSLFETVSSKFYAKDEKNPYDFTIQY 162 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5949> which encodes the amino acid 
sequence <SEQ ID 5950>. Analysis of this protein sequence reveals the following: 

Possible site: 51 
30 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1214 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 82/160 (51%), Positives = 119/160 (74%). 

40 Query: 25 MTKQIIAIWAEDEDHLIGVNGGLPWRLPKELHHFKETTMGQALLMGRKTFDGMNRRVLPG 84 

MTK+ 1 IAIWAEDE LIG+ G LPW LPKEL HFK+TT+ QA+LMGR TF+GMN + LP 
Sbjct: 1 MTKEIIAIWAEDEAGLIGIAGKLPWYLPKELEHFKKTTLHQAILMGRVTFEGMNCKRLPQ 60 

Query: 85 RETIILTKDEQFQADGVTVMSvEQVIKWFQEHNKTLFIVGGASIYKAFLPYCEAIIKTK 144 
45 R+T+++T++ +Q D V + S+E+V++W+ +KTL+I+GG + +AF Y + IIKT 

Sbjct: 61 RQTLVMTRNRDYQVDEVLTMTSIEKVLEWYHAQDKTLYIIGGNKVLEABTSIGYFDRIIKTV 120 

Query: 145 VHGKFKGDTYFPD VNLSEFKVI SRDYFEKDEQNAHAFTVT 184 
+H +FKGDTY P+++ S F S+ ++ +D +N + FTVT 
50 Sbjct: 121 IHHRFKGDTYRPNLDFSHFTQESQTFYARDAKNPYDFTVT 160 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1921 

55 A DNA sequence (GBSx2030) was identified in S.agalactiae <SEQ ID 5951> which encodes the amino 
acid sequence <SEQ ID 595 2>. Analysis of this protein sequence reveals the following: 

Possible site: 45 
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>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1577 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA25221 GB.-M33770 thymidylate synthase (EC 2.1.1.45) 
[Lactococcus lactis] 
Identities = 215/280 (76%) , Positives = 245/280 (86%) , Gaps = 2/280 (0%) 



Query: 


1 


MTKADLLFKONITKIMSEGVPSEQARPRYKNGEMANSKYITGAFAEYDLSKGEFPITTLR 


60 






MT AD +FK NI 1+ GVFSE ARP+YK+G+MANSKY+TG+F YDL KGEFPITTLR 




Sbjct: 


1 


MTYADQVFKQNIQNILDNGVFSENARPKYKDGQMANSKYvTGSFVTYDLQKGEFPITTLR 


60 


Query: 


61 


PIPIKSAIKEIFWIYQDQTNDI^VLNDKYGVTYWMJVffiVGHTGTIGQRYGAWKKHNIIS 


120 






PIPIKSAIKE+ WIYQDQT++L+VL +KYGV YW +W +G GTIGQRYGA VKK+NII 




Sb j ct : 


61 


PIPIKSAIKELWIYQDQTSEIjSVLEEKYGVKYWGEWGIGD-GTIGQRYGATVKKYNIIG 


119 


Query: 


121 


KLLKQLEDNPWNRRNVI SLWDYEAFEETEGIiIiPCAFQTMFDVRRV -NGELYLDATLTQRS 


179 






KLL+ L NPWNRRN+I+LW YE FEETEGLLPCAFQTMFDVRR +G++YLDATL QRS 




Sb j ct : 


120 


KLLEGIiAKNPWNRRNIINLWQYEDFEETEGLLPCAFQTMFDVRREKDGQIYLDATLIQRS 


179 


Query: 


180 


NDMLVAHHINAMQYVALQMMIAKHFGWRVGKFFYFINNLHIYDNQFEQAQELLKRQPSEC 


239 






NDMLVAHHINAMQYVALQMMIAKHF W+VGKFFYF+NNLHIYDNQFEQA EL+KR SE 




Sbjct: 


180 


NDMLVAHHINAMQWALQMMIAKHFSWKVGKFFYFVNNLHIYDNQFEQANELMKRTASEK 


239 


Query: 


240 


NPKLVLNVPDGTDFFDIKPDDFALVDYDPIKPQLRFDLAI 279 








P+LVLNVPDGT+FFDIKP+DF LVDY+P+KPQL+FDIAI 




Sb j ct : 


240 


EPRL VLNVPDGTNFFDIKPEDFELVDYEPVKPQLKFDLAI 279 





A related DNA sequence was identified in S.pyogenes <SEQ ID 5953> which encodes the amino acid 
sequence <SEQ ID 5954>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3131 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 227/279 (81%) , Positives = 251/279 (89%) 



Query: 


1 


MTKADLLFKDNITKIMSEGVFSEQARPRYKNGEMANSKYITGAFAEYDLSKGEFPITTLR 


60 






MTKAD +FK NI KI++EG SEQARP+YK+G A+SKYITGAFAEYDL+KGEFPITTLR 




Sb j ct : 


9 


MTKADQIFKANIQKIINEGSLSEQARPKYKDGRTAHSKYITGAFAEYDLAKGEFPITTLR 


68 


Query: 


61 


PIPI KSAI KE I FWIYQDQTNDIAVIJroKYGVTYWNDWEVGHTGTIGQRYGAVVKKHNI I S 


120 






PIPIKSAIKE+FWIYQDQ+N L VL KY V YWN+WEV T TIGQRYGAWKKH+IIS 




Sbjct: 


69 


PIPIKSAIKELFWIYQDQSNSLDVLEAKYNVHYTOEWEVDQTRTIGQRYGAVVKKHDIIS 


128 


Query: 


121 


KLLKQLEDNPWNRRNVISLWDYEAFEETEGLLPCAFQTMFDVRRVNGELYLDATLTQRSN 


180 






K+LKQL +NPWNRRNVI SLWDYEAFEET+GLLPCAFQ MFDVRRV +LYLDA+LTQRSN 




Sbjct: 


129 


KILKQIAENPWNRRNVISLWDYEAFEETKGLLPCAFQIMFDVRRVGEDLYLDASLTQRSN 


188 


Query: 


181 


DMLVAHHINAMQYVALQMMIAKHFGWRVGKFFYFINNLHIYDNQFEQAQELLKRQPSECN 


240 






D+LVAHHINAMQYVALQMMIAKHFGW++GKFFYF+NNLHIYDNQF+QAQELLKRQP 




Sbjct: 


189 


DILVAHHINAMQYVALQMMIAKHFGWKIGKFFYFVNNLHIYDNQFDQAQELLKRQPVASQ 


248 


Query: 


241 


PKL VLNVPDGTDFFDIKPDDFALVDYDPIKPQLRFDLAI 279 








PKLVLNVPD T+FFDIKPDDF L +YDP+KPQL FDLAI 




Sb j ct : 


249 


PKLVLNVPDRTNFFDIKPDDFELQNYDPVKPQLHFDLAI 287 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1922 

5 A DNA sequence (GBSx2031) was identified in S.agalactiae <SEQ ID 5955> which encodes the amino 
acid sequence <SEQ ID 5956>. This protein is predicted to be HMG-CoA synthase. Analysis of this protein 
sequence reveals the following: 

Possible site: 60 

>>> Seems to have no N-terminal signal sequence 

10 

Final Results 

bacterial cytoplasm Certainty=0. 0816 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5957> which encodes the amino acid 
sequence <SEQ ID 5958>. Analysis of this protein sequence reveals the following: 

Possible site: 58 

>>> Seems to have no N-terminal signal sequence 

20 

Final Results 

bacterial cytoplasm 
bacterial membrane 
bacterial outside 

25 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 260/385 (67%) , Positives = 325/385 (83%) 





Query: 


36 


MKIGIDKIGFATSQYVLEMTDLAIARQVDPEKFSKGLLLDSLSITPVTEDI VTLAASAAN 


95 


30 






M IGIDKIGFATSQYVL++ DIA+ARQVDP KFS+GLL++S S+ P+TEDI+TLAASAA+ 






Sbjct: 


14 


MTIGIDKIGFATSQYV1KLEDLALARQVDPAKFSQGLLIESFSVAPITEDIITLAASAAD 


73 




Query: 


96 


DILSDEDKETIDMVIVATESSIDQSKAASVYVHQLLEIQPFARSFEMKEACYSATAALDY 


155 








IL+DED+ IDMVI+ATESS DQSKA+++YVH L+ I QPFARS FE+ K+ACYSATAALDY 




35 


Sb j ct : 


74 


Q1LTDEDRAKIDMVILATESSTDQSKASAIYVHHLVGIQPFARSFEVKQACYSATAALDY 


133 




Query: 


156 


AKLHVEKHPDSKVLVIASDIAKYGIKSTGESTQGAGSIAMLISQNPSILELKEDHLAQTR 


215 








AKLHV PDS+VLVIASDIA+YG+ S GESTQG+GSIA+L++ NP IL L ED++AQTR 




40 


Sb j ct : 


134 


AKLHVASKPDSRVLVIASDIARYGVGSPGESTQGSGSIALLVTANPRILALNEDNVAQTR 


193 




Query: 


216 


DIMDFWRPNYSDVPYVNGMFSTKQYLDMLKTTWKVYQKRFNTSLSDYAAFCFHIPFPKLA 


275 








DIMDFWRPNYS PYV+G++STKQYL+ L+TTW+ YQKR N LSD AA CFHIPFPKLA 






Sb j ct : 


194 


DIiyroFWRPNYSFTPYVDGIYSTKQYLNCLETTWQAYQKRENLQLSDLAAVCFHIPFPKLA 


253 


45 


Query: 


276 


LKGFNKI LDNNLDEQKKAELQENFEHS ITYSKKIGNCYTGSLYLGLLSLLENSQNLKAGD 


335 








LKG N I+DN + + + +L E F+ SI+YSK+IGN YTGSLYLGLLSLLENS+ L++GD 






Sb j ct : 


254 


LKGLNNIMDNTVPPEHREKLIEAFQASISYSKQIGNIYTGSLYLGLLSLLENSKVLQSGD 


313 




Query: 


336 


QIAFFSYGSGAVAEIFTGQLVDGYQNKLQSDRMDQLNKRQKITVTEYEKLFFEKTILDEN 


395 


50 






+1 FFSYGSGAV+E ++GQLV GY L ++R L++R +++V++YE LF+E+ LD+N 






Sb j ct : 


314 


KIGFFSYGSGAVSEFYSGQLVAGYDKMLMTNRQALLDQRTRLSVSKYEDLFYEQVQLDDN 


373 




Query: 


396 


GNANFNTYRTGTFSLDSICEHQRIY 420 










GNANF+ Y TG F+L +1 EH+RIY 




55 


Sb j ct : 


374 


GNANFDIYLTGKFALTAIKEHRRIY 398 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Certainty=0. 1670 (Affirmative) < suco 

Certainty=0. 0000 (Not Clear) < suco 

Certainty=0 . 0000 (Not Clear) < suco 
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Example 1923 

A DNA -sequence (GBSx2032) was identified in S.agalactiae <SEQ ID 5959> which encodes the amino 
acid sequence <SEQ ID 5960>. This protein is predicted to be HMG-CoA reductase (mvaA). Analysis of 
this protein sequence reveals the following: 

Possible site: 50 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.49 Transmembrane 348 - 364 ( 348 - 364) 
INTEGRAL Likelihood = -1.33 Transmembrane 53 - 69 ( 53 - 69) 



Final Results 

bacterial membrane Certainty=0 . 1595 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG02454 GB:AF290098 HMG-CoA reductase [Streptococcus pneumoniae] 
Identities = 266/421 (63%) , Positives = 343/421 (81%) , Gaps = 3/421 (0%) 

Query: 3 KISWTGFSKKSPEERIHYLEEQDFLADSSLEIVTNQDLLSLSLANQMAENVIGRIALPFS 62 

KISW GFSKKS +ER+ L+ Q L+ + + +S+++A+Q++ENV+G +LP+S 

Sbjct: 2 KISWNGFSKKSYQERLELLKAQALLSPERQASLEKDEQMSVWADQLSENWGTFSLPYS 61 

Query: 63 LVPDVLVNGKOTQVPYvTEEPSVVAAASFAAKIIia^SGGFLTTVHNRKMIGQVALyDVQD 122 

LVP+VLVNG+ Y VPYVTEEPSWAAAS+A+KIIKR+GGF VH R+MIGQVALY V + 
Sbjct: 62 LVPEVLVNGQGYTVPYVTEEPSWAAASYASKIIKRAGGFTAQVHQRQMIGQVALYQVAN 121 

Query: 123 SQHTKESILNQKQQLLEIANAAHPSIVKRGGGACDLTIEI KEDFLIVYLMVDTKEAM 179 

+ +E I ++K +LLE+AN A+PSIVKRGGGA DL +E + DFL+VY+ VDT+EAM 
Sbjct: 122 PKIAQEKIASKKAELLELANQAYPSIVKRGGGARDLHVEQIKGEPDFLVVYIHVDTQEAM 181 

Query: 180 GANMVNTMMEALSS PLED I SKGKSLMS I LSNYATESLVTATCRVDLRFLSRQKEEAI KLA 239 

GANM+NTM+EAL LE++S+G+SLM ILSNYAT+SLVTA+CR+ R+LSRQK++ ++A 
Sbjct: 182 GANMLNTMLEALKPVLEELSQGQSLMGILSNYATDSLVTASCRIAFRYLSRQKDQGREIA 241 

Query: 240 QKMTMASQLAQVDPYFASTHNKGIFNGIDAIVIATGNDWRAIEAGAHTYAVKDGQYRGLS 299 

+K+ +ASQ AQ DPYRA+THNKGIFNGIDAI++ATGNDWRAIEAGAH +A +DG+Y+GLS 
Sbjct: 242 EKIALASQFAQADPYRAATHNKGIFNGIDAILIATGNDWRA1EAGAHAFASRDGRYQGLS 301 

Query: 300 RWSYKVDDNCLEGTLTLPMPVATKGGSIGINPSVHLAHDLLGRPNAKELASIILSIGLAQ 359 

W+ ++ L G +TLPMPVATKGGS IG+NP V L+HDLLG P+A+ELA II+SIGLAQ 
Sb j ct : 302 CWTLDLEREELVGEMTLPMP VATKGGS IGLNPRVALSHDLLGNPSARELAQI IVS IGLAQ 361 

Query:' 360 NFAALKALVSTGIQAGHMKLQAKSLALLAGAKEEQISEVVKQLLDSKHMNLETAQKI VTSIKL 420 

NFAALKALVSTGIQ GHMKLQAKSLALLAGA E +++ +V++L+ K NLETAQ+ + L 
Sbjct: 362 NFAALKALVSTGIQQGHMKLQAKSLALLAGASESEVAPLVERLISDKTFNLETAQRYLENL 422 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5961> which encodes the amino acid 
sequence <SEQ ID 5962>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm -— Certainty=0. 3 92 9 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 257/422 (60%) , Positives = 330/422 (77%) 

Query: 2 TKISWTGFSKKSPEERIHYLEEQDFLADSSLEIVTNQDLLSLSLANQMAENVIGRIALPF 61 
T ++W+GFSKK+ EER+ +E+ L +L + LL + ANQM ENV+GR+ALPF 
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Sbjct: 4 raUSTOSGFSKKTFEERLQLIEKFKLmAENMQLKTDVLLPIQTMtQMTEIWLGRL^PF 63 

Query: 62 SLVPDVLTOGKOTQVPYVTEEPSWAAASFJWyCIIKRSGGFLTTVHNRKMIGQVALYDVQ 121 

S+ PD LVNG YQ+P+VTEEPSWAAASFAAK+IKRSGGF NR+MIGQ+ LYD+ 

Sbjct: 64 SIAPDFLVNGSTyQMPFVTEEPSWAAASFAAKLIKRSGGFKAQTLNRQMIGQIVLYDID 123 

Query: 122 DSQHTKESIENQKQQLLEIANAAHPSIVKRGGGACDLTIEIKEDFLIVYLMVDTKEAMGA 181 

+ K +IL++ ++L+ +AN A+PS IVKRGGGA + +E K +FLI YL VDT+EAMGA 
Sbjct: 124 QIDNAKAAILHKTKKLIALANKAYPSIVKRGGGARTIHLEEKGEFLIFYLTVDTQEAMGA 183 

Query: 182 NMVOTMMEALSSPLEDISKGKSLMSILSlSrYATESLVTATCRVDLRFLSRQKEEAIKLAQK 241 

NMVNTMMEAL L +SKG LM+ILSNYATESLVT +C + +R L K ++++IAQK 
Sbjct: 184 MTO1TMMEALVPDLTRLSKGHCLMAILSNYATESLVTTSCEIPVRLLDHDKTKSLQLAQK 243 

15 Query: 242 MTMASQIAQVDPYRASTHNKGIFNGIDAIVIATGNDWRAIEAGAHTYAVKDGQYRGLSRW 301 

+ +AS+LAQVDPYRA+THNKGIFNGIDA+V+ATGNDWRAIEAGAH YA ++G Y+GLS+W 
Sbjct: 244 IELASRLAQVDPYRATTHNKGIFNGIDAWIATGNDWRAIEAGAHAYASRNGSYQGLSQW 303 

Query: 302 SYKVDDNGLEGTLTLPMPVATKGGSIGINPSVHLAHDLLGRPNAKELASIILSIGIAQNF 361 
20 + D L G +TLPMP+A+KGGSIG+NP+V +AHDLL +P+AK LA +1 S+GLAQNF 

Sbjct: 304 HFDQDKQVLLGQMTLPMPIASKGGSIGLNPTVSIAHDLLNQPDAKTLAQLIASVGLAQNF 363 

Query: 362 AALKALVSTGIQAGHMKLQAKSIALI^GAKEEQISEVVKQLLDSKHMNLETAQKIVNKLT 421 
AALKAL S+GIQAGHMKL AKSLALLAGA +++I+ +V LL K +NLE A +++L 
25 Sbjct: 364 AALKALTSSGIQAGHMKLHAKSLALIAGATQDEIAPLVNALLADKPINLEKAHFYLSQLR 423 

Query: 422 KS 423 
+S 

Sbjct: 424 QS 425 

30 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1924 

A DNA sequence (GBSx2033) was identified in S.agalactiae <SEQ ID 5963> which encodes the amino 
35 acid sequence <SEQ ID 5964>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

>>> Seems to have no N- terminal signal sequence 

Final Results 

40 . bacterial cytoplasm Certainty=0 .2355 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

45 A related DNA sequence was identified in S.pyogenes <SEQ ID 5965> which encodes the amino acid 
sequence <SEQ ID 5966>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

»> Seems to have no N-terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty=0 .2687 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

55 An alignment of the GAS and GBS proteins is shown below. 

Identities = 76/138 (55%) , Positives = 100/138 (72%) , Gaps = 2/138 (1%) 



Query: 7 PKWEELPELDLYLDQVLLYVNQLINPKTITNDKLLTASMINNYVKHNYISKPIKKKYNRR 66 
P W++LP+LDLYLDQVLLYVNQ + ++++K LTASMINNYVKH Y++KPIKKKY ++ 
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Sbjct: 7 PYHKDIiPDLDLYLDQVLLYVNQCTDFSEVSDNKSLTASMINNYVKHGYVTKPIKKKyQKQ 66 

Query: 67 QVARr.IVITAFKQVFAIQEISQTLELLTM3NHSEEATOGFJUiCMNKEE--VHDIjPPWIS 124 

Q+ARLI 1+ FK VF IQ+IS+ LE L A SE YN F C N++ D+PP+V 
Sbjct: 67 QLARLIAISLFKTVFPIQDISRVLEELQAQADSESIjYOTFVTCWNQKAPIEEDIPPIVQV 126 

Query. 125 ACQTLNLYQETQKLVLEL 142 

ACQT+ Y +T L+ E+ 
Sbjct: 127 ACQTVKDYHKTIYLLQEV 144 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1925 

A DNA sequence (GBSx2034) was identified in S.agalactiae <SEQ ID 5967> which encodes the amino 
acid sequence <SEQ ID 5968>. This protein is predicted to be hemolysin iii. Analysis of this protein 
sequence reveals the following: 

Possible site: 43 

>» Seems to have no N-terminal signal sequence 



INTEGRAL 


Likelihood 




-9. 


08 


Transmembrane 


142 


- 158 


( 140 


- 165) 


INTEGRAL 


Likelihood 




-6. 


.79 


Transmembrane 


26 


- 42 


( 19 


- 44) 


INTEGRAL 


Likelihood 




-5. 


.63 


Transmembrane 


200 


- 216 


( 196 


- 217) 


INTEGRAL 


Likelihood 




-5. 


,41 


Transmembrane 


104 


- .120 


( 102 


- 121) 


INTEGRAL 


Likelihood 




-3. 


.98 


Transmembrane 


51 


- 67 


( 49 


- 69) 


INTEGRAL 


Likelihood 




-1 


.86 


Transmembrane 


172 


- 188 


( 169 


- 188) 



Final Results 

bacterial membrane --- Certainty=0.4630(Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 995 1> which encodes amino acid sequence <SEQ ID 9952> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA58877 GB:X84058 novel hemolytic factor [Bacillus cereus] 
Identities = 79/204 (38%), Positives = 132/204 (63%), Gaps = 4/204 (1%) 

Query: 17 EELANSITHAVGALLMLILLPITAVYSHNHFGLQAALGTSIFVTSLFLMFLSSSIYHSMT 76 

EE-t-AN+ITH +GA+L + L I +++ H A + +++ S+FL++L S++ HS+ 
Sbjct: 14 EEIANAITHGIGAILS I PALI ILI IHASKHGTASAWAFTVYGVSMFLLYLFSTLLHSIH 73 

Query: 77 YNSLQKYVLRMIDHSMIYIAIAGSYTPVALSLIGGWLGYLIIFLQWGITLFGILYKIFAP 136 

+ ++ k + ++DHS IY+ IAG+YTP h + G LG+ ++ + W + + GI++KIF 
Sbjct: 74 HPKVEK-LFTILDHSAIYLLIAGTYTPFLLITLRGPLGWTLLAIIWTLAIGGIIFKIFFV 132 

Query: 137 KINDKFSLvLYLIMGWLVIF-IFPAIITKTGPAFWGLLIAGGICyriGALFYA-RKRPYD 194 

+ K S + Y+IMGWL+I IP TG F LLLAGGI Y++GA+F+ K P++ 

Sbjct: 133 RRFIKASTLCYIIMGWLIIVAIKPLYENLTGHGF-SLLLAGGILYSVGAIFFLWEKLPFN 191 

Query: 195 HMIWHLFILLASILQYIGIVYFML 218 

H IWHLF+L S + + ++4++L 
Sbjct: 192 HAIWHLFVLGGSAMMFFCVLFYVL 215 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5969> which encodes the amino acid 
sequence <SEQ ID 5970>. Analysis of this protein sequence reveals the following: 

Possible site: 41 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-10.51 Transmembrane 144 - 160 ( 138 - 163) 
INTEGRAL Likelihood = -9.87 Transmembrane 49 - 65 ( 45 - 71) 
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-7.11 Transmembrane 
-6.16 Transmembrane 
-2.97 Transmembrane 
-1.01 Transmembrane 



198 - 214 ( 193 - 215) 

102 - 118 ( 100 - 120) 

20 - 36 ( 20 - 41} 

167 - 183 ( 167 - 185) 



Final Results _ Sj 

bacterial membrane --- Certainty=0. 5203 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:CAA58877 GB:X84058 novel hemolytic factor [Bacillus cereus] 
Identities = 82/204 (40%) , Positives = 128/204 (62%) , Gaps = 4/204 (1%) 

Query; 15 EEVANS VTHAI GAFAMLI LLP ISAS YAYQTYDLKAAIGI S I FVI SLFLMFLSSTI YHSMA 74 

EE+AN++TH IGA + LI +A + A + +++ +S+FL++L ST+ HS+ 

Sbjct: 14 EEIANAITHGIGAILSIPALIILIIHASKHGTASAWAFTVYGVSMFLLYLFSTLLHSIH 73 

Query: 75 YGSVHKYILRIIDHSMIYIAIAGSYTPVALSLVSGWLGYIIIVLQWGITLFGILYKIFAK 134 

+ V K + I+DHS IY+ IAG+YTP h + G LG+ ++ + W + + GI++KIF 
Sbjct: 74 HPKVEK-LFTILDHSAIYLLIAGTYTPFLLITLRGPLGWTLLAIIWTLAIGGIIFKIFFV 132 

Query: 135 RINEKFSLMLYIVMGWL- WFILPVIIQKTSLAFGLLMLFGGLSYTIGAVFYA-KKRPYF 192 

R K S + YI+MGWL +V I P+ T F LL L GG+ Y++GA+F+ +K P+ 
Sbjct: 133 RRFIKASTLCYI IMGWLI 1VAI KPLYENLTGHGFSLL - LAGGILYSVGAIFFLWEKLPFN 191 

Query: 193 HMIWHLFILLASALQFIAITFFML 216 

H IWHLF+L SA+ F + F++L 
Sbjct: 192 HAITCHLFVLGGSAMMFFCVLFYVL 215 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 153/213 (71%) , Positives = 181/213 (84%) 

Query: 6 SIKLSPQLSFGEEI^SITHAVGALUILILLPITAVYSHNHFGLQAALGTSIFVTSLFLM 65 

+ K S LSF EE+ANS+THA+GA MLILLPI+A Y++ + L+AA+G SIFV SLFLM 
Sbjct: 4 TFKQSLPLSFSEEVANS VTHAIGAFAML ILLPI SASYAYQTYDLKAAIGI SI FVISLFLM 63 

Query: 66 FLSSSIYHSMTYNSLQKYVLRMIDHSMIYIAIAGSYTPVALSLIGGWLGYLIIFLQWGIT 125 

FLSS+IYHSM Y S+ KY+LR+ IDHSMI YIAIAGS YTPVALSL+ GWLGY+II LQWGIT 
Sbjct: 64 FLSSTIYHSMAYGS VHKYILRI IDHSMI YIAIAGSYTPVALSLVSGWLGYI I IVLQWGIT 123 

Query: 126 LFGILYKIFAPKIITOKFSLVLYLIMGPJLVIFIFPAIITKTGPAFWGLLLAGGICYTIGAL 185 

LFGILYKIFA +IN+KFSL+LY++MGWLV+FI P II KT AF L+L GG+ YTIGA+ 
Sbjct: 124 LFGILYKIFAKRINEKFSLMLYIVMGWLWFILPVIIQKTSLAFGLLMLFGGLSYTIGAV 183 

Query: 186 FYARKRPYDHMIWHLFILLASILQYIGIVYFML 218 

FYA+KRPY HMIWHLFILLAS LQ+I I +FML 
Sbjct: 184 FYAKKRPYFHMIWHLFILLASALQFIAITFFML 216 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1926 

A DNA sequence (GBSx2035) was identified in S.agalactiae <SEQ ID 5971> which encodes the amino 
acid sequence <SEQ ID 5972>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 3641 (Affirmative) < suco 
bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside — Certainty=0.0000(Not Clear) < suco 
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The protein- has homology with the following sequences in the GENPEPT database. 

>GP:CAB12492 GB:Z99107 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 81/302 (26%) , Positives = 157/302 (51%) , Gaps = 10/302 (3%) 

; ' . 5 "* 

Query: 1 MKSAYI FFNPKSGKDECALAKEVKSYLIEHDFQDDY- VRI ITPSSVEEAVALAKKASEDH 59 

MK A I +NP SG++ + K+ + +++ Q Y + +A AK+A+ 

Sbjct: 1 MKRARI IYNPTSGRE 1 FKKHLAQVLQKFEQAGYETSTHATTCAGDATHAAKEAALRE 57 

Query: 60 IDLVIPLGGDGTINKICGGVYAGGAYPTIGLVPAGTVNNFSKALNIPQERNL-ALENLLN 118 

DL+I GGDGTIN++ G+ PT+G++P GT N+F++AL IP+E L A + ++N 

Sbjct: 58 FDLIIAAGGDGTINEWNGLAPLDNRPTLGVIPVGTTNDFARALGIPREDILKAADTVIN 117 

Query: 119 GHVKSVDICKViroDYMISSLTLGLIiADIAANVTSEMKRKLGPFAFLGDAYRILKRNRSYS 178 

G + +DI +VN Y 1+ G L ++ +V S++K LG A+ +L R 

Sbjct: 118 GVARPIDIGQVNGQYFINIAGGGRLTELTYDVPSKLKTMLGQIiAYYIiKGMEMIjPSLRPTE 177 

Query: 179 ITLAYDNNVRSLRTRLLLITMTNSIAGMPAFSPEATIDDGLFRVYTMEHIHFFKLLLHLR 238 

+ + YD + L L+T+TNS+ G +P+++++DG+F +++ + + + 

Sbjct: 178 VEIEYDGKLFQGEIMLFLWLTNSVGGFEKI^PDSSIMDGMFDLMILKKANIiAEFIRVAT 237 

Query: 239 QFRKGDFSQAKEIKHFHTNNLTISTFKRKKSAIPKVRIDGDPGDQLPVKVEVIPKALKFI 298 

+G+ + I + N + ++ ++ ++ +DG+ G LP + + + + + 

Sbjct: 238 MALRGEHINDQHIIYTKANRVKVNVSEKM QLNLDGEYGGMLPGEFVNLYRHIHW 292 

:i 

Query: 299 IP 300 
+P 

Sbjct: 293 MP 294 

A related DNA sequence was identified in S.pyogenes <SEQ ED 5119> which encodes the amino acid 
sequence <SEQ ID 5120>. Analysis of this protein sequence reveals the following: 
Possible site: 58 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 4258 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 172/300 (57%) , Positives = 229/300 (76%) 

Query: 1 MKSAYIFFNPKSGKDEQALAKEVKSYLIEHDFQDDYVRIITPSSVEEAVALAKKASEDHI 60 

MK+ IF+NP SGK E LA++VK Y +H F +D V++ITP ++A LAK+A++D I 
Sbjct: 1 MKTWIFYNPNSGKKESQLARQVKDYFCQHGFSEDSVKVITPKDADQAFQLAKQAAKDKI 60 

Query: 61 DLVIPLGGDGTINKICGGVYAGGAYPTIGLVPAGTVNNFSKAmiPQERNLALENLLNGH 120 

DLVIPLGGDGT+NKI GG+Y GGA+ IGLVP+GTVNNF+KA++IP + AL+ +L G 
Sbjct: 61 DLVIPLGGDGTIiNKIIGGIYEGGAHCIjIGLVPSGTVNNFAKAMHIPLQITEALDTILTGQ 120 

Query: 121 VKSVDICroMDDYMISSLTLGLLADIAANVTSEMKRKLGPFAFLGDAYRILKRNRSYSIT 180 

+K VDICK N YMISSLTLGLLADIAA+VT+E KR+ GP AFL D+ RILKRNRSY+I+ 
Sbjct: 121 IKQVDICKANQQYMISSLTLGLLADIAADVTAEEKRRFGPLAFLKDSIRILKRNRSYAIS 180 

Query: 181 LAYDNNVRSLRTRLLLITMTNSIAGMPAFSPEATIDDGLFRVYTMEHIHFFKLLLHLRQF 240 

L N+ L+T+ LLITMTN+IAG P+FSP A DDG F+VYTM+ + FFK L H+ F 
Sbjct: 181 LISHNHRIHLKTKFLLITMTNTIAGFPSFSPGAQADDGYFQVYTMKKVSFFKFLVJHINDF 240 

Query: 241 RKGDFSQAKEIKHFHTNNLTISTFKRKKSAIPKVRIDGDPGDQLPVKVEVIPKALKFIIP 3O0 

++GDFS+A+EI HF N L++ +K++ +P+ RIDGD D LP+++++IPKA+ I+P 
Sbjct: 241 KQGDFSKAEEISHFQANTLSLLPQAKKQAILPRTRIDGDKSDYLPIQLDI I PKAVS I IVP 300 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1927 

A DNA sequence (GBSx2036) was identified in S.agalactiae <SEQ ID 5973> which encodes the amino 
5 acid sequence <SEQ ID 5974>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

»> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0. 3 62 8 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

15 >GP:BAB10885 GB :AB010693 gene_id:K21C13 .21~pir| |T04769~strong 

similarity to unknown protein [Arabidopsis thaliana] 
Identities = 85/291 (29%) , Positives = 150/291 (51%) , Gaps = 28/291 (9%) 

Query: 10 DQEWEVP VESGRYHMIVGEFCPYAQRPQI ARQLLGLDKHI S I SFVDDV 57 

20 D + + P ESGRYH+ + CP+A R ++ GLD+ 1+ S V +.. 

Sbjct: 29 DPDSQFPAESGRYHLYI SYACPWACRCLS YLKI KGLDEAITFSS VHAIWGRTKETDDHRG 88 

Query: 58 PSDIGIiIFSQPEQVTGAKSLRDIYHLTDPTYQGPYTIPIIiIDKTDNRIVCKESADL 113 

SD L ++P+ + GAKS+R++Y + P Y+G YT+P+L DK +V ES+++ 
25 Sbjct: 89 WVPPDSDTELPGAEPDYLNGAKSTOELYEIASPNYEGKYTVPVLWDKia.KTVVNNESSEI 148 

Query: 114 LRLFTTDFSDLHQEDAPVLFSQETASLIDNDIKDINKNFQSIMYKLAFLDKQADYDTYSK 173 

+R+F T+F+ + + + L+ +1+ + + +YK F KQ Y+ 

Sbjct: 149 IRMFNTEFNGIAKTPSLDLYPSHLRDVINETNGWVCTGINNGVYKCEFARKQEPYNEAVN 208 

30 

Query: 174 EFFTFLDQKEHLLGQRPFLLGDNLSEVDIHFFTPLVRWDIAGRDLLLLNQKALEDYPNIF 233 

+ + +D+ E +LG++ ++ G+ +E DI F L+R+D N++ h +YPNIF 

Sbjct: 209 QLYEAVDRCEEVLGKQRYICGNTFTFJU3IRLFVTLIRFDEVYAVHFKCNKRLLREYPNIF 268 

35 Query: 234 SWAKTLYNDFNLKTLTNPQSIKNNYY LGKFGRAVRHHTIVPTGPNM 279 

++ K +Y + + N + IK +YY + FG I+P GPN+ 

Sbjct: 269 NYIKDIYQIHGMSSTVNMEHIKQHYYGSHPTINPFG IIPHGPNI 312 

No corresponding DNA sequence was identified in S.pyogenes. 

40 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1928 

A DNA sequence (GBSx2037) was identified in S.agalactiae <SEQ ID 5975> which encodes the amino 
acid sequence <SEQ ID 5976>. Analysis of this protein sequence reveals the following: 

45 Possible site: 59 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2647 (Affirmative) < suco 

50 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB07793 GB:AB037666 hypothetical protein [Streptomyces sp. 
55 CL190] 
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Identities = 127/331 (38%) , Positives = 194/331 (58%) , Gaps = 9/331 (2%) 

Query: 4 RKDDHIKYALKYQSHY NSFDDIELIHSSLPKYNVNDIDLSTHFAGQSFEFPFYINAM 60 

RKDDH++ A++ + + N FDD+ +H +L + D+ L+T FAG S++ P YINAM 
Sbjct: 6 RKDDHVRLAIEQHNAHSGRNQFDDVSFVHHAIAGIDRPDVSIATSFAGISWQVPIYINAM 65 

I 

Query: 61 TGGSEKGKAVITOKIAQVAQATGI VMVTGSYSAALKNDE--DDSYPTTDLYPDLKLATNIG 118 

TGGSEK +N LA A+ TG+ + +GS +A +K+ D D P+ + NI 

Sbjct: 66 TGGSEKTGLINRDrATAARETGVPIASGSMNAYIKDPSCADTFRVLRDENPNGFVIANIN 125 

Query: 119 LDKPVPAAESTVKAMNPIFLQVHVNVMQELLMPEGEREFHMWRSHLKEYVDNIQCPLILK 178 

V A+ + + LQ+H+N QE MPEG+R F W +++ + P+I+K 
Sbjct: 126 ATTTVDNAQRAIDLIEANALQIHINTAQETPMPEGDRSFASWVPQIEKIAAAVDIPVIVK 185 

15 Query: 179 EVGFGMDLQSIKDAYDIGITTVDISGRGGTSFAYIENQRGR- -DRSYLNTWGQTTAQSLI 236 

EVG G+ Q+I D+G+ D+SGRGGT FA IEN R D ++L+ WGQ+TA L+ 
Sbjct: 186 EVGNGLSRQTILLLADLGVQAADVSGRGGTDFAR1ENGRRELGDYAFLHGWGQSTAACLL 245 

Query: 237 NAQSMMDKM31LASGGIRHPLDMVKCLVLGAKAVGLSRTWliELWRYPVDDVIAILNSWK 296 
20 +AQ + + +LASGG+RHPLD+V+ L LGA+AVG S L + VD+IL+W 

Sbjct: 246 DAQDI - - SLPVLASGGVRHPLDVVRALALGARAVGSSAGFLRTLMDDGVDALITKLTTWL 303 

Query: 297 EDLRMIMCALNCKKITDLRQVNYILYGQLKE 327 ■ 
+L+ L+ DL++ +L+G+L++ 
25 Sbjct: 304 DQLAALQTMLGARTPADLTRCDVLLHGELRD 334 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5977> which encodes the amino acid 

sequence <SEQ ID 5978>. Analysis of this protein sequence reveals the following: 

Possible site: 51 
30 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 2823 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 244/329 (74%) , Positives = 284/329 (86%) 

40 Query: 1 MTNRKDDHIKYALKYQSHYNSFDDIELIHSSLPKYNVNDIDLSTHFAGQSFEFPFYINAM 60 

MTNRKDDHIKYALKYQS YN+FDDIELIH SLP Y+++DIDLSTHFAGQ F+FPFYINAM 
Sbjct: 31 MTNRKDDHIKYALKYQSPYNAFDDIELIHHSLPSYDLSDIDLSTHFAGQDFDFPFYINAM 90 

Query: 61 TGGSEKGKAVNHKLAQVAQATGI VMVTGSYSAALKNDEDDSYPTTDLYPDLKLATNIGLD 120 
45 TGGS+KGKAVN KLA+VA ATGIVMVTGSYSAALKN DDSY ++ +LKLATNIGLD 

Sbjct: 91 TGGSQKGKAVNEKIAKVAAATGIVMVTGSYSAALKNPNDDSYRLHEVADNLKLATNIGLD 150 

Query: 121 KP VPAAESTVKAMNP I FLQVHVNVMQELLMPEGEREFHMWRSHLKEYVDNI QCPLI LKEV 180 
KPV + TV+ M P+FLQVHVNVMQELLMPEGER FH W+ HL EY I P+ILKEV 
50 Sbjct: 151 KPVALGQQWQEMQPLFLQVHVNVMQEIiLMPEGERVFHTWKKHLAEYASQIPVPVILKEV 210 

Query: 181 GFGMDLQSIKDAYDIGITTVDISGRGGTSFAYIENQRGRDRSYLNTWGQTTAQSLINAQS 240 

GFGMD+ SIK A+D+GI T DISGRGGTSFAYIENQRG DRSYLN WGQTT Q L+NAQ 
Sbjct: 211 GFGMDWSIKLAHDLGIQTFDISGRGGTSFAYIENQRGGDRSYLNDWGQTTVQCLLNAQG 270 

55 

Query: 241 ^DK^ILASGGIRHPLDMVKCLVLGAKAVGLSRTVLELVERYPVDDVIAILNSWKEDLR 300 

+MD+++IIASGG+RHPLDM+KC VLGA+AVGIiSRTVIiELvE+YP + VIAI+N WKE+L+ 
Sbjct: 271 LMDQVEILASGGVRHPLDMIKCFVLGARAVGLSRTVLEI.VEKYPTERVIAIWGWKEELK 330 

60 Query: 301 MIMCALNCKKITDLRQVNYILYGQLKEAN 329 

+IMCAL+CK I +L+ V+Y+LYG+L++ N 
Sbjct: 331 IIMCALDCKTIKELKGVDYLLYGRLQQVN 359 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful, antigens for 
vaccines or diagnostics. 

Example 1929 

A DNA sequence (GBSx2038) was identified in S.agalactiae <SEQ ID 5979> which encodes the amino 
5 acid sequence <SEQ ID 5980>. This protein is predicted to be phosphomevalonate kinase. Analysis of this 
protein sequence reveals the following: 

Possible site: 41 

>>> Seems to have no N- terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0 . 0785 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG02457 GB:AF290099 phosphomevalonate kinase [Streptococcus pneumoniae] 
Identities = 170/330 (51%) , Positives = 233/330 (70%) , Gaps = 1/330 (0%) 

Query: 1 MVKVCTGGK1YIAGEYAILYPGQVAILKNVPIYMTAIATFADNYSLYSDMFNYTASLQPD 60 
20 M+ V+T GKLY AGEYAIL PGQ+A++K++PIYM A F+D+Y +YSDMF++ L+P+ 

Sbjct: 1 MIAVKTCGKLYWAGEYAILEPGQLALIKDIPIYMRAEIAFSDSYRIYSDMFDFAVDLRPN 60 

Query: 61 KQYSLIQETILLMEEWLINFGKNIKPIHLEITGKLERYGLKFGIGSSGSVWLTIKAMAA 120 
YSLIQETI LM ++L G+N++P L+I GK+ER G KFG+GSSGSVWL +KA+ A 
25 Sbjct: 61 PDYSLIQETIALMGDFLAWGQNLRPFSLKICGKMEREGKKFGLGSSGSWvLWKALLA 120 

Query: 121 LYEIEMPSDLLFKLSAYVLLKRGDNGSMGDIACIAYEHLISYSAFDRRAVSKMIETKPLE 180 

LY + + +LLFKL++ VLLKRGDNGSMGD+ACI E h+ Y +FDR+ + +E + L 
Sbjct: 121 LYNLSVDQNLLFKLTSAVLLKRGDNGSMGDIACIVAEDLVLYQSFDRQKAAAWLEEEOTiA 180 

30 

Query: 181 QVLEAEWGYRITKIQALLEMDFLVGWTMQPSISKEMINIVKSTITQRFLDDTKYQWQLL 240 

VLE +WG+ LE DFLVGWT + + + S M+ +K I Q FL +K W L+ 

Sbjct: 181 TVLERDWGFFISQVKPTLECDFLVGWTKEVAVSSHMVQQIKQNINQNFLSSSICETWSLV 240 

35 Query: 241 SAFKEGDKFAIKRCLEEISLLLFNLHPSIYTDKLQKLKEASKGLDIVTKSSGSGGGDCGI 300 

A ++G E + +E S LL L IYT L++LKEAS+ L V KSSG+GGGDCGI 
Sbjct: 241 EALEQGKAEKVIEQVEVASKLLEGLSTDIYTPLLRQLKEASQDLQAVAKSSGAGGGDCGI 300 

Query: 301 AISFN-KNDNQTLIKRWESAGIELLSKETL 329 
40 A+SF+ ++ TL RW GIELL +E + 

Sbjct: 301 ALSFDAQSSRNTLKNRWADLGIELLYQERI 330 

A related DNA sequence was identified in S.pyogenes <SEQ ID 598 1> which encodes the amino acid 
sequence <SEQ ID 5982>. Analysis of this protein sequence reveals the following: 

45 Possible site: 59 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2669 (Affirmative) < suco 

50 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 171/325 (52%) , Positives = 227/325 (69%) , Gaps = 2/325 (0%) 



55 



Query: 4 VQTGGKLYIAGEYAILYPGQVAILK33TOIYIWTALATFADNYSLYSDMFNYTASLQPDKQY 63 

VQTGGKLY+ GEYAIL PGQ A++ +P+ MTA + A + L SDMF++ A + PD Y 
Sbjct: 22 VQTGGKLYLTGEYAI LTPGQKALIHF I PLMMTAE ISPAAH I QLASDMFSHKAGMTPDAS Y 81 
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Query: 


64 


SLIQETILLMEEWLINFGKNIKPIHLEITGKLERYGLKFGIGSSGSVVVLTIKAMAALYE 


123 






+LIQ T+ ++L' ++P L ITGK+ER G KFGIGSSGSV +LT+KA++A Y+. 




Sbjct: 


82 


ALIQATVKTFADYLGQSIDQLEPFSLIITGKMERDGKKFGIGSSGSVTLLTLKALSAYYQ 


141 


Query: 


124 


IEMPSDLLFKLSAYVLLKRGDNGSMGDIACLAYEHLISYSAFDRRAVSKMIETKPLEQVL 


183 






I + +LLFKL+AY LLK+GDNGSMGDIACIAY+ L++Y++FDR VS ++T PL+++L 




Sb j ct : 


142 


ITLTPELLFKLAAYTLLKQGDNGSMGDIACIAYQTLVAYTSFDREQVSNWLQTMPLKKLL 


201 


Query: 


184 


EAEWGYRITKIQALLEMDFLVGWTMQPSISKEMINIVKSTITQRFLDDTKYQWQ-IjLSA 


242 






+WGY I IQ L DFLVGWT P+IS++MI V ++IT FL T YQ+ Q + A 




Sb j ct : 


202 


vkdwgyhiqviqpalpcdflvgwtkipaisrqmiqqvtasitpafl-rtsyqltqsamva 


260 


Query: 


243 


fkegdkeaikrcleeislllfnlhpsiytdklqklkeaskgldivtkssgsgggdcgiai 


302 






+EG KE +K+ L S LL LHP+IY KL L A + D V KSSGSGGGDCGIA+ 




Sb j ct : 


261 


LQEGHKEELKKSLAGASHLLKELHPAIYHPKLVTLVAACQKQDAVAKSSGSGGGDCGIAL 


320 


Query: 


303 


SFNKNDNQTLIKRWESAGIELLSKE 327 








+FN++ TLI +W+ A I LL +E 




Sbjct: 


321 


AFNQDARDTLISKWQEADIALLYQE 345 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1930 

25 A DNA sequence (GBSx2039) was identified in S.agalactiae <SEQ ID 5983> which encodes the amino 
acid sequence <SEQ ID 5984>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -1.75 Transmembrane 20 - 36 ( 18 - 36) 

30 

Final Results 

bacterial membrane Certainty=0. 1702 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

35 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

40 Example 1931 

A DNA sequence (GBSx2040) was identified in S.agalactiae <SEQ ID 5985> which encodes the amino 
acid sequence <SEQ ID 5986>. This protein is predicted to be mevalonate diphosphate decarboxylase. 
Analysis of this protein sequence reveals the following: 

Possible site: 25 
45 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1557 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG02456 GB:AF290099 mevalonate diphosphate decarboxylase 
[Streptococcus pneumoniae] 
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Identities = 219/312 (70%) ; Positives = 264/312 (84%) 

Query: 1 MDGKSISVKSYANIAIIKYWGKADAEKMIPATSSISLTLENMYTETRLTALGKDAKKDEF 60 

MD + ++V+SYANIAIIKYWGK ++M+PATSSISLTLENMYTET L+ L + DEF 
Sbjct: 1 MDREPVTTOSYANIAIIKyWGKKKEKEMVPATSSISLTLENIWTETTLSPIjPANvTADEF 60 

Query: 61 YISGVLQ^HEHDKMSAILDRFRQNRSGEVKIETTNMPTAAGLSSSSSGLSALVKACND 120 

YI+G LQN+ EH KMS I+DR+R GFV+I+T NNMPTAAGLSSSSSGLSALVKACN 
Sbjct: 61 YINGQLQNEVEHAKMSKIIDRYRPAGEGFVRIDTQNNMPTAAGLSSSSSGLSALVKACNA 120 

Query: 121 FFGTNLSQSQIAQEAKFASGSSSRSFFGPVAAWDKDSGDIYKOTTOLDLAMIMLVlJSroKR 180 

+F L +SQIAQEAKFASGSSSRSF+GP+ AWDKDSG+IY V T+L LAMIMLVL DK+ 
Sbjct: 121 YFKLGLDRSQLAQEAKFASGSSSRSFYGPLGAWDKDSGEIYPVETDLKLAMIMLVLEDKK 180 

15 Query: 181 KPISSREGMKICTETSTTFNEWVRQSEQDYQDMLWLKNM3FQKVGQLTERNALAMHSTT 240 

KPISSR+GMK+C ETSTTF++WVRQSE+DYQDML+YLK NDF K+G+LTE+NALAMH+TT 
Sbjct: 181 KPISSRDGMKLCVETSTTFDDWVRQSEKDYQDMLIYLKENDFAKIGELTEKNAIAMHATT 240 

Query: 241 KTATPAFSYLTEETYKAMDVVKKLREKGHECYYTMDAGPNVKVLCLRQDLEALAAILEKD 300 
20 KTA+PAFSYLT+ +Y+AM V++LREKG CY+TMDAGPNVKV C +DLE L+ I + 

Sbjct: 241 KTASPAFSYLTDASYEAMAFVRQLREKGEACYFTMDAGPNVKVFCQEKDLEHLSEIFGQR 300 

Query: 301 YRI I VSTTKEIA 312 
YR+IVS TK+L+ 
25 Sbjct: 301 YRLIVSKTKDLS 312 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5987> which encodes the amino acid 

sequence <SEQ ID 5988>. Analysis of this protein sequence reveals the following: 

Possible site: 36 
30 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .1271 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 221/313 (70%) , Positives = 258/313 (81%) 

40 Query: 1 MDGKSISVKSYANIAIIKYWGKADAEKMIPATSSISLTLENMYTETRLTALGKDAKKDEF 60 

+D I+V SYANIAIIKYWGK + KMIP+TSSISLTLENM+T T ++ L A D+F 
Sbjct: 1 VDPNVITVTSYANIAIIKYWGKENQAKMIPSTSSISLTLENMFTTTSVSFLPDTATSDQF 60 

Query: 61 YISGVLQNDHEHDKMSAILDRFRQNRSGFVKIETTNNMPTAAGLSSSSSGLSALVKACND 120 
45 YI+G+LQND EH K+SAI+D+FRQ FVK+ET NNMPTAAGLSSSSSGLSALVKAC+ 

Sbjct: 61 YINGILQNDEEHTKISAIIDQFRQPGQAFVKMETQNNMPTAAGLSSSSSGLSALVKACDQ 120 

Query: 121 FFGimSQSQIAQEAKFASGSSSRSFFGPVAATOKDSGDIYKVHTNLDIAMIMLVLNDKR 180 
F T L Q IAQ+AKFASGSSSRSFFGPVAAWDKDSG IYKV T+L +AMIMLVLN + 
50 Sbjct: 121 LFDTQLDQKAIAQKAKFASGSSSRSFFGPVAAWDKDSGAIYKVETDLKI^IMLVLNAAK 180 

Query: 181 KPISSREGMKICTETSTTFNEWVRQSEQDYQDMLVYLKNNDFQKVGQLTERNALAMHSTT 240 

KPISSREGMK+C +TSTTF++WV QS DYQ ML YLK N+F+KVGQLTE NALAMH+TT 
Sbjct: 181 KPISSREGMKLCRDTSTTFDQWVEQSAIDYQHMLTYLKTNNFEKVGQLTEANALAMHATT 240 

55 

Query: 241 KTATPAFSYLTEETYKA^VVKKLREKGHECYYTiWAGPNVTCVLCIiRQDLEALAAILEKD 300 

KTA P FSYLT+E+Y+AM+ VK+LR++G CY+TMDAGPNVKVLCL +DL LA L K+ 
Sbjct: 241 KTANPPFSYLTKESYQAMFAVTCELRQEGFACTFTMDAGPNVKVLCLEKDLAQLAERLGKN 300 

60 Query: 301 YRI IVSTTKELAD 313 

YRIIVS TK+L D 
Sbjct: 301 YRI I VSKTKDLPD 313 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1932 

A DNA sequence (GBSx2041) was identified in S.agalactiae <SEQ ID 5989> which encodes the amino 
5 acid sequence <SEQ ID 5990>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

»> Seems to have no N- terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0. 1512 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 599 1> which encodes the amino acid 
15 sequence <SEQ ID 5992>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

>» Seems to have no N-terminal signal sequence 

Final Results 

20 bacterial cytoplasm Certainty=0 . 1117 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

25 Identities = 182/290 (62%) , Positives = 223/290 (76%) 



30 



Query: 1 MKEKFGIGKAHSKIILMGEHSWYGYPAIAIPLKNIEVTCLIEEAPQLIALDMTDPLSTA 60 

MEG GKAHSKI IL+GEH+WYGYPAIA+PL +IEV CI A + + D D LSTA 
Sbjct: 6 MNENIGYGKAHSKIILIGEHAWyGYPAIALPLTDIEWCHIFPADKPLVFDFYDTLSTA 65 

Query: 61 IFAALDYLGKTSSKIAYHIESQVPERRGMGSSAAVAIAAIRAVFDYFDEDLEADLLECLV 120 

I+A+LDYL + IAY I SQVP++RGMGSSAAV+IAAIRAVF Y EL DLLE LV 
Sbjct: 66 IYASLDYLQRLQEPIAYEIVSQVPQKRGMGSSAAVSIAAIRAVFSYCQEPLSDDLLEILV 125 

35 Query: 121 NRAEMIAHSNPSGLDAKTCLSENTIKFIRNIGFSTVPMHLNAYLVIADTGIHGHTKEAVD 180 

N+AE+ IAH+NPSGLDAKTCLS++ IKFIRNIGF T+ + LN YL+IADTGIHGHT+EAV+ 
Sbjct: 126 NKAEI IAHTNPSGLDAKTCLSDHAIKFIRNIGFETIEIALNGYLI IADTGIHGHTREAVN 185 

Query: 181 KVKSSGEAVLPFLKELGYLAEASEDAIHKSDSKQLGSLMTKAHQSLKQLGVSSLEADHLV 240 
40 KV E LP+L +LG L +A E AI++ + +G LMT+AH +LK +GVS +AD LV 

Sbjct: 186 KVAQFEETNLPYLAKLGALTQALERAINQKNKVAIGQLMTQAHSALKAIGVSISKADQLV 245 

Query: 241 EVAISCGALGAKMSGGGLGGCIIALVKEKREAERLSQQLEREGAvNTWTE 290 
E A+ GALGAKM+GGGLGGC+IAL K AE++S +L+ EGAVNTW + 
45 Sbjct: 246 EAALRAGALGAKMTGGGLGGCMIAIADTKDMAEKISHRLKEEGAVNTWIQ 295 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1933 

50 A DNA sequence (GBSx2042) was identified in S.agalactiae <SEQ ID 5993> which encodes the amino 
acid sequence <SEQ ID 5994>. This protein is predicted to be a histidine protein kinase. Analysis of this 
protein sequence reveals the following: 

Possible site: 26 

»> Seems to have an uncleavable N-term signal seq 
55 INTEGRAL Likelihood =-13.43 Transmembrane 12 - 28 ( 4 - 33) 
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INTEGRAL Likelihood - -9.29 Transmembrane 163 - 179 ( 157 - 191) 



Final Results 

bacterial membrane Certainty^O . 6371 (Affirmative) < suco 

5 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF79919 GB:AF039082 putative histidine protein kinase 
10 [Lactococcus lactis] 

Identities = 78/315 (24%) , Positives = 154/315 (48%) , Gaps = 33/315 (10%) 

Query: 101 SDRQIKNYAKRIVSQNSHSGHITYNFSTYSYLLKKVGKNDYLWFLDTTNQYLDNQRLLQ 160 
+++QI N + + +N+ + Y + TS + V++ + Q + 

15 Sbjct: 84 NEKQI-NTIQTOSVKNPYGDNWHYRYLTTSQFIITNSDGTVTPVYVQIFSNVDQIQDAMS 142 

Query: 161 LSIWM SLVSFIVFMVIVSV-LSGRVILPFVANYEKQRRFITNAGHELKTPLAIISAN 216 

++W+ ++++F + VI+S+ L+ + P +A YEKQ+ F+ NA HEL+TPLAI+ 
Sbjct: 143 RAMWVIOTTMITFWILSVIISLYLANWTLKPILAAYEKQKEFVENASHELRTPLAILQNR 202 

20 

Query: 217 NELV EMMSGESEWTKSTNDQIQRLTGLINGMVSLAR FEEQPDISM 261 

EL+ + +SE + +++ + L + +++LAR E +P + 

Sbjct: 203 LELLFQKPTATIIDQSENISESLSEVRNMRLLTSNLLNLARRDSGIKIEPEPTTATYFEN 262 

25 Query: 262 VDLDFSHITKDAAEDFKGPIIKDGKDFIMSIQPGIHVKAEEKSLFELVTLLVDNANKYCD 321 
+ + +T++A + F G + +G V ++ + +L+T+L DNA KY D 

Sbjct: 263 I FNS YEMLTENAGKKFSGNLKLEGT VNLDQALIKQLLTILFDNALKYTD 311 

Query: 322 PMGTVTVKLSRSSRLRRAKLEVSNTYKNGKDIDYSKFFERFYREDESHNNKKSGYGIGLS 381 
30 G ++V + ++ V++ + DDK F+RF+R D++ +K G G+GLS 

Sbjct: 312 SEGEISVDVIKNGGF--LTFAVADNGEGISDEDKKKIFDRFFRVDKARTRQKGGLGLGLS 369 

Query: 382 IVTSLVHLFKGSIDV 396 
+ +V + G I V 
35 Sbjct: 370 liAKQIVEAYNGKITV 384 

A related DNA sequence was identified in S.pyogenes <SEQ ID 575 1> which encodes the amino acid 
sequence <SEQ ID 5752>. Analysis of this protein sequence reveals the following: 

Possible site: 24 
40 »> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-11.30 Transmembrane 18 - 34 ( 13 - 42) 
INTEGRAL Likelihood =-10.35 Transmembrane 170 - 186 ( 163 - 199) 

Final Results 

45 bacterial membrane Certainty=0 . 5522 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

50 Identities = 233/410 (56%) , Positives = 303/410 (73%) , Gaps = 1/410 (0%) 

Query: 1 MFRNLRLRFIGIAAIAILWLFSWGVLNSANHYQTKNEIYRVLTILADNNGRIPNKLEF 60 
MF +R+RFI IA++AI ++L S+VG++N+A YQ++ EI R+L +++ N G++P E 
. Sbjct: 10 MFNRIRIRFIMIASIAIFIILSSIVGIINTARCYQSQQEINRILHLISSNKGKLPGTTES 69 

55 

Query: 61 SKELGDDLSTDAIFQFRYFSARTDAKGNVTSFDSRNIFEVSDRQIKNYAKRIVSQNSHSG 120 

SK LG LS D++ QFRY+S +A G++ S ++ NI + + + +A+ G 
Sbjct: 70 SKRLGTKLSEDSLSQFRYYSVIFNANGHLLSSNTANISALDREEAQYFARLFAKSGEEKG 129 

60 Query: 121 HITYNFSTYSYLLKKVGKNDYLWFLDTTNQYLDNQRLLQLSIWMSLVSFIVFMVIVSVL 180 

+ S YSYL+ ++ + LW LDTT + LL +S+ ++ FI F+V+VS+ 

Sbjct: 130 SYRHQDSWSYLITQLPNEEKLWILDTTFYFRSVGDLIAVSVMLAFGGFIFFVVLVSLF 189 



Query: 181 SGRVILPFVANYEKQRRFITNAGHELKTPLAIISANNELVEMMSGESEWTKSTNDQIQRL 240 
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SG VI PFV NYEKQRRFITNAGHELKTPLAIISflNNELVE+M+GESEWTKST+DQ++RL 
Sbjct: 190 SGWIKPFVQNYEKQRRFITNAGHELKTPLAI I SANNELVEIiMTGESEWTKSTSDQVKRL 249 

Query: 241 TGLINGMVSLARFEEQPDISMVDLDFSHITKDAAEDFKGPIIKDGKDFIMSIQPGIHVKA 300 
5 TGLIN M++LAR EEQPD+ + +DFS I +DAAEDFK ++KDGK F ++IQP I +KA 

Sbjct: 250 TGLINQMITLARLEEQPDWLHMVDFSAIAQDAAEDFKSLVLKDGKRFDLTIQPNIMIKA 309 

Query: 301 EEKSLFELOTLLVDNANKYCDPMGTVTVKLSRSSRLR-RAKLEVSNTYKNGKDIDYSKFF 359 
EEKSLFELVT+LVDNANKYCDP G V V L+ R R RAKLEVSNTY GK IDYS+FF 
10 Sbjct: 310 EEKSLFELWILVDNANKYCDPKGLVKVSLTTIGRRRKRAKLEVSNTYLEGKSIDYSRFF 369 

Query: 360 ERFYREDESHNNKKSGYGIGLSIVTSLVHLFKGSIDVNYKHDTITFVIYI 409 

ERFYREDESHN+K+ GYGIGLS+ S+V LFKG+I VNYK+D I F + I 
Sbjct: 370 ERFYREDESHNSKEKGYGIGLSMAESIWKLFKGTITVNYKNDAIVFTVVI 419 

15 

SEQ ID 5994 (GBS273) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 51 (lane 14; MW 46kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 56 (lane 5; MW 71kDa). 

GBS273-GST was purified as shown in Figure 208, lane 4. 

20 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1934 

A DNA sequence (GBSx2043) was identified in S.agalactiae <SEQ ID 5995> which encodes the amino 
acid sequence <SEQ ID 5996>. Analysis of this protein sequence reveals the following: 

25 Possible site: 16 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2181 (Affirmative) < suco 

30 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

35 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1935 

A DNA sequence (GBSx2044) was identified in S.agalactiae <SEQ ID 5997> which encodes the amino 
acid sequence <SEQ ID 5998>. This protein is predicted to be two-component response regulator (trcR). 
40 Analysis of this protein sequence reveals the following: 

Possible site: 56 

>>> Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=0. 2503 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9379> which encodes amino acid sequence <SEQ ID 9380> 
50 was also identified. 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP;BAB04091 GB : AP001508 two- component response regulator [Bacillus halodurans] 
Identities = 71/183 (38%) , Positives = 120/183 (64%) , Gaps = 3/183 (1%) 

5 Query: 9 RVLIAEDEEQMSRVLSTAISHQGYVVDVAyDGQTAIDLMQNAYDWIVMDVMMPVKTGIE 68 

R+LI EDE++++RVL + H+GY D A-h G ++ +A+D++++DVM+P +G+E 
Sbjct: 3 RILIIEDEKKIARVLQLELEHEGYETDAAFSGSDGr.ETFQAHAWDLVLLDVMLPELSGLE 62 

Query: 69 AVKEIRQSGMSHIIMLTAMAElDDRWGLDAGADDYLTKPFSLKELIARLRSMSRRIiE- 127 
10 ++ IR + + II+LTA I D+V+GLD GA+DY+TKPF ++EUAR+R+ R ++ 

Sbjct: 63 VLRRIRMTDPVTPIILLTARNSIPDKYSGLDIGANDYITKPFEIEELIjARVRACLRTVQT 122 

Query: 128 -DFTPNVLSLGRVTLSVGEQELQCEN-T1RIAGKEAKMLAFFMLNHDKELSTQQLFEHVW 185 
+ + L +T++ B TI 1 KE ++L FF+ N + LS +Q+ +VW 

15 Sbjct: 123 RERVEDTLMFQELTINEKTRDVQRGlffiTIELTPKEFELLVFFIKNKGQVLSREQILTNVW 182 



20 



25 



30 



Query: 186 GAD 188 
G D 

Sbjct: 183 GFD 185 



A related DNA sequence was identified in S.pyogenes <SEQ ID 5999> which encodes the amino acid 
sequence <SEQ ID 6000>. Analysis of this protein sequence reveals the following: 



Possible site: 21 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm — Certainty=0. 2391 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 125/185 (67%) , Positives = 151/185 (81%) 

Query: 8 MRVLIAEDEEQMSRVLSTAISHQGT\ATOVAYDGQTAIDLAN^ 67 
35 M++L+AEDE QMS VL+TA++HQGY VDV ++GQ AID A NAYD+M++D+MMP+K+GI 

Sbjct: 1 MKILtAEDEWQMSNVLTTAMTHQGYDVOWFNGQEAIDKAKDNAYDIMILDIMMPIKSGI 60 

Query: 68 EAVKEIRQSGNKSHIIMLTAM^IDDRVTGLDAGADDYLTKPFSLKELLARLRSMSRRLE 127 
EA+KEIR SGN SHIIMLTAMAEI+DRVTGLDAGADDYLTKPFSLKELLARLRSM RR+E 
40 Sbjct: 61 EALKEIRASGNCSHIIMLTAMAEI^RVTGIiDAGADDYIjTKPFSLKELLARLRSMERRVE 120 



Query: 128 DFTPNTOSIGRVTLSVGEQELQCEOTriRIAGKEAKMIAFFMIJSraDKELSTQQLFEHVWGA 187 
ftp vr. vrr,+4. v.rw.T, w TPT.a w. ifj-iiu mt.w v t. t_l ■t.j.^.mmi 
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bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

5 . >GP:BAB05604 GB : AP001513 unknown conserved protein [Bacillus halodurans] 
Identities = 67/182 (36%) , Positives = 111/182 (60%) , Gaps = 4/182 (2%) 

Query: 17 LEDFSQRIQLENDKAICVETGYKLYEHIIGRIKTSDSMIEKCRRKQLPVTVDSALKTIRDS 76 
L++ + +1 + + + Y EH+ R+K+ +S++ K +R+ T++S + +RD 
10 Sbjct: 29 LQELNTKIDILKQEFQYIHDYNPIEHVSSRVKSPESIVNKIQRRGNDFTLESIRENVRDI 88 



Query: 77 IGVRIICGFVNDIYQIIERIKAFDDCRIWEKDYIQHVKPNGYRSYHVILEIDTPYPDCL 136 

G+RI C F +DIY + E++ D +V KDYI++ KPNGYRS H+IL I P + 
Sbjct: 89 AGIRITCSFESDIYTLSEQLMQQHDISWETKDYIKNPKPNGYRSLHLILSI PIFM 144 

15 

Query: 137 GNSDGKYYIEIQLRTIAQDSWASLEHQMKYKHDIENPERIVRELKRCADEMA.SVDLTMQT 196 

+ Y+E+Q+RTIA D WASLEH++ YK++ PE +++ELK A+ A +D M+ 

Sbjct: 145 SDRVQDVYVEVQIRTIAMDFWASLEHKIYYKYNKNVPEHLLKELKDAAESAALLDQKMEK 204 

20 Query: 197 IR 198 

1+ 

Sbjct: 205 IQ 206 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6003> which encodes the amino acid 
25 sequence <SEQ ID 6004>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

>>> Seems to have no N-terminal signal sequence 

_ Final Results 

30 bacterial cytoplasm — Certainty=0. 1057 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

35 Identities = 127/206 (61%) , Positives = 162/206 (77%) 

Query: 3 TNIYGDYGRYLPLILEDFSQRIQLENDKAKVETGYKLYEHIIGRIKTSDSMIEKCRRKQL 62 

++IY + YLPL+L+ + I EN K+K ETG+KLYEH RIK+ SMIEKC+RKQL 
Sbjct: 11 SSIYSGFEVYLPLVLQTITDVIIAENIKSKKETGFKLYEHFTSRIKSEASMIEKCQRKQL 70 

40 

Query: 63 PVTVnSALKTIRDSIGTOIICGFVNDIYQIIERIKAFDDCRIVVEKDYIQHVKPNGYRSY 122 

P+T SALK I+DSIG+RIICGF++DIY++++ +K+ + EKDYI + KPNGYRSY 

Sbjct: 71 PLTSKSALKIIKDSIGIRIICGFIDDIYRMVDLLKS1PGMSVNTEKDYILNAKPNGYRSY 130 

45 Query: 123 HVILEIDTPYPDCLGNSDGKYYIEIQLRTIAQDSWASLEHQMKYKHDIENPERIVRELKR 182 

H+ILE++T +PD LG G Y+IE+QLRTIAQDSWASLEHQMKYKH + N E I RELKR 
Sbjct: 131 HLILELETHFPDILGEKKGCYFIEVQLRTIAQDSWASLEHQMKYKHQVANAEMITRELKR 190 

Query: 183 CADEMASVDLTMQTIRQLIESGTKKE 208 
50 CADE+AS D+TMQTIRQLI+ T++E 

Sbjct: 191 CADELASCDVTMQTIRQLIQETTEEE 216 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

55 Example 1937 

A DNA sequence (GBSx2046) was identified in S.agalactiae <SEQ ID 6005> which encodes the amino 
acid sequence <SEQ ID 6006>. Analysis of this protein sequence reveals the following: 



Possible site: 40 

>>> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0. 3250 (Affirmative) < su.cc> 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA37193 GB:X53013 ORF1 (AA 1 - 384) [Lactococcus lactis] 
Identities = 30/55 (54%) , Positives = 37/55 (66%) 

Query: 1 MEFYYKTLKRKFINDADTIFIEQSQFEIFIYIETDHNSSSSHWLDYQSQKEFEK 55 

ME +YKTLKR+ INDA ++ EIF YIET +N+ H LDYQS K+FEK 

Sbjct: 327 MESFYKTLKRELINimHFETRAEaTQEIFKYIETYYNTKWMHSGLDYQSPKDFEK 381 

15 A related DNA sequence was identified in S.pyogenes <SEQ ID 6007> which encodes the amino acid 
sequence <SEQ ID 6008>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

>>> Seems to have no N-terminal signal sequence 

20 ' Final Results 

bacterial cytoplasm Certainty=0. 3065 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

25 An alignment of the GAS and GBS proteins is shown below. 

Identities = 31/59 (52%) , Positives = 39/59 (65%) 

Query: 1 MEFYYKTLKRKFINDADTIFIEQSQFEIFIYIETDHNSSSSRAA/LDYQSQKEFEKIITN 59 
ME +YKTLKR+ +NDA I+Q+Q EIF Y ET +N H L Y S EFEKI+T+ 
30 Sbjct: 13 MEAFYKTLKRELvNDAHFATIKOAQLEIFKYSETYYNPKRLHSALGYLSPVEFEKIVTH 71 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1938 

35 A DNA sequence (GBSx2047) was identified in S.agalactiae <SEQ ID 6009> which encodes the amino 
acid sequence <SEQ ID 6010>. This protein is predicted to be R5 protein. Analysis of this protein sequence 
reveals the following: 

Possible site: 51 

»> Seems to have no N-terminal signal sequence 
40 INTEGRAL Likelihood = -3.98 Transmembrane 30 - 46 ( 29 - 51) 

INTEGRAL Likelihood = -2.76 Transmembrane 967 - 983 ( 966 - 985) 

Final Results 

bacterial membrane Certainty=0. 2593 (Affirmative) < suco 

45 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8935> which encodes amino acid sequence <SEQ ID 8936> 
was also identified. Analysis of this protein sequence reveals the following: 

50 Lipop: Possible site: -1 Crend: 8 

SRCFLG: 0 

McG: Length of UR: 2 

Peak Value of UR: 2.44 
Net Charge of CR: 2 
55 McG: Discrim Score: 0.78 

GvH: Signal Score (-7.5): -0.0599995 
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Possible site: 39 
>>> Seems to have a cleavable N-term signal seq. 
Amino Acid Composition: calculated from 40 
ALOM program count: 0 value: 7.37 threshold: 0.0 
5 PERIPHERAL Likelihood = 7.37. 194 

modified ALOM score: -1.97 

*** Reasoning Step: 3 

10 Rule gpol 

Final Results 

bacterial outside - 
bacterial membrane - 
15 bacterial cytoplasm - 

LPXTG motif: 944-948 

No corresponding DNA sequence was identified in S.pyogenes. 

20 SEQ ID 8936 (GBS200) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 29 (lane 3; MW 107.4kDa), in Figure 169 (lane 4; MW 122kDa) and in Figure 
238 (lane 11; MW 122kDa). It was also expressed in E.coli as a GST-fusion product. SDS-PAGE analysis 
of total cell extract is shown in Figure 35 (lane 3; MW 132kDa). 

Purified Thio-GBS200-His is shown in Figure 244, lane 9. 

25 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1939 

A DNA sequence (GBSx2048) was identified in S.agalactiae <SEQ ID 601 1> which encodes the amino 
acid sequence <SEQ ID 6012>. This protein is predicted to be a 16.1 kDa transcriptional regulator. Analysis 
30 of this protein sequence reveals the following: 

Possible site: 25 

>>> Seems to have no N- terminal signal sequence 

Final Results 

35 bacterial cytoplasm Certainty=0 . 3919 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9953> which encodes amino acid sequence <SEQ ID 9954> 
40 was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB16108 GB:Z99124 similar to transcriptional regulator (MarR 
family) [Bacillus subtilis] 
Identities = 30/114 (26%) , Positives = 59/114 (51%) , Gaps = 3/114 (2%) 

45 

Query: 29 DvEHLAGPQGHLVMYLYKHPDKDMSIKAvEEIMISKBVASNLVTO?MEKNGFIAIVPSKT 88 

D++ G +LV +Y++P + + + E++ + ++ A+ +K++E GFI +P + 
Sbjct: 25 DLDLTRGQYLYLTO-IYENPG--IIQEKLAEMIKVDRTTAARAIKKLEMQGFIQJCLPDEQ 81 



-- Certainty=0 .3000 (Affirmative) < suco 
-- Certainty= 0.0000 (Not Clear) < suco 
— Certainty=0. 0000 (Not Clear) < suco 



50 



Query: 89 
Sbjct: 82 



DKRVKYLYLTHLGKKKATQFEIFLEKLHSTMIAGITKEEIRTTKKVIRTLAKNM 142 
+K++K L+ T GKK E L+G T EE T ++ + KN+ 

NKKIKKLFPTEKGKKVYPLLRREGEHSTEVALSGFTSEEKETISALLHRVRKNI 13 5 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 6013> which encodes the amino acid 
sequence <SEQ ID 6014>. Analysis of this protein sequence reveals the following: 

Possible site: 43 



An alignment of the GAS and GBS proteins is shown below. 

Identities =27/64 (42%), Positives = 46/64 (71%) 

Query: 3 MENPLQKARILVNQLEKYLDHYAKEYDVEHLAGPQGHLVMYLYKHPDKDMSIKAVEEILH 62 

M + R L++Q+E+ D AK+YDVEHLAGPQG+++++L KH ++++ +K +E+ L 
Sbjct: 1 MSQVIGDLRELIHQIEQISDEIAKKYDVEHLAGPQGYVLVFLAKHQNQEIFVKDIEKQLR 60 

Query: 63 ISKS 66 
I +S 

Sbjct: 61 IFQS 64 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1940 

A DNA sequence (GBSx2049) was identified in S.agalactiae <SEQ ID 6015> which encodes the amino 
acid sequence <SEQ ID 6016>. This protein is predicted to be 5' -nucleotidase family protein. Analysis of 
this protein sequence reveals the following: 

Possible site: 27 

»> Seems to have a cleavable N-terra signal seq. 

INTEGRAL Likelihood = -2.66 Transmembrane 668 - 684 ( 665 - 684) 



>>> Seems to have no N- terminal signal sequence 



Final Results 



bacterial cytoplasm Certainty=0 .4175 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0. 2062 (Affirmative) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12747 GB:Z99108 similar to 5 ' -nucleotidase [Bacillus subtilis] 
Identities = 178/535 (33%) , Positives = 270/535 (50%) , Gaps = 55/535 (10%) 



Query: 28 DQVGVQVIGVNDFHGALDNTGTANMPDGKVANAGTAAQLD AYMDDAQKDFKQTNPNG 84 

+ V ++++ +ND HG +D ++ DG GT ++D AY+ + + + K 
Sbjct: 586 EHVPLRILSMNDLHGKIDQQYELDL-DGNGTVDGTFGRMDYAAAYLKEKKAEKKN 639 



Query: 85 ESIRVQAGDIWGASPANSGLLQDEPTVKNFNAMNVEYGTLGNHEFDEGLAEYNRI VTGKA 144 

S+ V AGDM+G S S LLQDEPTV+ + + GT+GNHEFDEG E RI+ G 
Sbjct: 640 -SLIVHAGDMIGGSSPVSSLLQDEPTVELMEDIGFDVGTVGNHEFDEGTDELLRILNG-G 697 



Query: 145 PAPDSNINNITKSYPHEAAKQEIVVANVIDKVNKQIPYNWKPYAIKNIPvNNKSVNVGFI 204 

p +++ p +v AN ++ +P+ +N + V V FI 

Sbjct: 698 DHPKGTSGYDGQNFP LVCANC KMKSTGEPFLPAYDI INVEGVP VAFI 744 



Query: 205 GIOTKDIPNLVLRKNYEQYEFLDFAETIvKYAKELCMCNVKAIVVLAHVPATSKN^ 264 

G+VT+ +V+ + + EF DEA + K A+EL+ K VKAI VLAH+ A + G 
Sbjct: 745 GVVTQSAAGlWMPEGIKNIEFTDFATAWKAAEELIOaCGvT<AIAVLAHMSAEQNGNAITG 804 



Query: 265 



EAAE^KKVNQLFPENSVDIVFAGHNHQYTNGLVGKTRIVQALSQGKAYADVRGVLDTDT 324 
E+A++ K ++ +D++FA HNHQ NG V IVQA GKA V +D T 
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Sbjct: 805 ESADLANKT DSEIDVIFAAHNHQVVNGETOGKLIVQAFEYGKAIGVVDVEIDKTT 859 

QDFIETPSAKVIAVAPGKKTGSADIQAIVDQaNTIVKQVTEAKIGTAEVSVMITRSVDQD 384 
+D ++ SA+++.V K AI+ + TI + + +G A V + S D D, 

KDIVK-KSAEIVYVDQSKIEPDVSASAILKKYETIAEPIISEWGEAAVDMEGGYSNDGD 918 

NVSPVGSLITEAQLAIARKSWPDIDFAMTNNGGIRADLLIKPDGTITWGAAQAVQPFGNI 444 

+P+G+LI + A + DFA+ N GGIR L G ITWG +QPFGN+ 

- -TPLGNLIADGMRAAMK TDFALMNGGGIREAL- - -KKGPITWGDLYNIQPFGNV 968 

LQVVEITGRDLYKAMffiQYDQKQNFFLQIAGLRYTYTDNKEGGEETPFKWKAYKSNGEE 504 
L +EI G+DL + +N Q I+G +TYT +KE G+ K+ ++G E 

LTKLE I KGKDLRE I INAQIS PVFGPDYSISG- - FTYTWDKETGKAVDMKM ADGTE 1021 

INPDAKYKDVINDFLFGGGDGFASFRNAKLLGAINP DTEVFMAYI TDLEK 554 

I PDA Y L +N+F+ A ++ LLG NP D E + Y+ ++ 



A related DNA sequence was identified in S. pyogenes <SEQ ID 1607> which encodes the amino 
sequence <SEQ ID 1608>. Analysis of this protein sequence reveals the following: 

Possible site: 40 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -4.67 Transmembrane 662 - 678 ( 661 - 679) 
INTEGRAL Likelihood = -2.02 Transmembrane 19 - 35 ( 18 - 35) 



Sb j ct : 


805 


Query: 


325 


Sb j ct : 


860 . 


Query: 


385 


Sbj ct : 


919 


Query: 


A A R 


Sbj ct : 


969 


Query: 


505 


Sbj ct : 


1022 



Final Results 

bacterial membrane Certainty=0. 2869 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm -— Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 415/688 (60%) , Positives = 517/688 (74%) , Gaps = 21/688 (3%) 

Query: 1 MKKKIILKSSVLGLVAGTSIMFSSVFADQVGVQVIGVNDFHGALDNTGTANMPDGKVANA 60 

MKK ILKSSVL ++ +++ + V ADQV VQ +GVNDFHGALDNTGTA P GK+ NA 
Sbjct: 14 MKKYFILKSSVLSILTSFTLLVTDVQADQVDVQFLGVNDFHGALDNTGTAYTPSGKIPNA 73 

Query: 61 GTAAQLDAYMDDAQKDFKQTNPNGESIRVQAGDMVGASPANSGLLQDEPTVKNFNAMNVE 120 

GTAAQL AYMDDA+ DFKQ N +G SIRVQAGDMVGASPANS LLQDEPTVK FN M E 
Sbjct: 74 GTARQLGAYMDDAEIDFKQANQDGTSIRVQAGDMVGASPANSALLQDEPTVKVFNKMKFE 133 

Query: 121 YGTLGNHEFDEGLAEYNRIWGKAPAPDSNINNITKSYPHEAAKQEIWANVIDKVNKQI 180 

YGTLGNHEFDEGL E+NRI +TG+AP P+S IN+ITK Y HEA+ Q IV+ANVIDK K I 
Sbjct: 134 YGTLGNHEFDEGLDEFNRIMTGQAPDPESTINDITKQYEHEASHQTIVIANVIDKKTKDI 193 



Query: 181 PYNWKPYAIKNIPVNNKSVIWGFIGIVTKDIPNLVLRKNYEQYEFLDEAETIVKYAKELQ 240 

PY WKPYAIK+I +N+K V +GFIG+VT +IPNLVL++NYE Y+FLD AETI KYAKELQ 
Sbjct: 194 PYGWKPYAIKDIAINDKIVKIGFIGVVTTEIPNLVLKQNYEHYQFLDVAETIAKYAKELQ 253 

Query: 241 AKNVKAIVVLABT^PATSKiroiAEGFjAAEMMKKVNQLFPFjN^ 300 

++V AIWLAHVPATSK+ + + E A +M+KVNQ++PE+S+DI+FAGHNHQYTNG +GK 
Sbjct: 254 EQHVHAIVVLAHVPATSKDGVVDHEMATvMEKVNQIYPEHSIDIIFAGHNHQYTNGTIGK 313 

Query: 301 TRIVQALSQGKAYADTOGVLDTDTQDFIETPSAKvXAVAPGKKTGSADIQAIVDQANTIV 360 

TRIVQALSQGKAYADVRG LDTDT DFI+TPSA V+AVAPG KT ++DI+AI++ AN IV 
Sbjct: 314 TRIVQALSQGKAYADVRGTLDTDTNDFIKTPSANVVAVAPGIKTENSDIKAIINHANDIV 373 

Query: 361 KQVTEAKIGTAEVSVMITRSVDQDNVSPVGSLITFJ^IAIARKSWPDIDFAMTNNGGIRA. 420 

K VTE KIGTA S I+++ + D SPVG+L T AQL IA+K++P +DFAMTNNGGIR+ 
Sbjct: 374 KTVTERKIGTATNSSTISKTENIDKESPVGNIATTAQLTIAKKTFPTVDFAMTNNGGIRS 433 

Query: 421 DLLIKPDGTITWGARQAVQPFGNILQWEITGRDLYKALNEQYDQKQNFFLQIAGLRYTY 480 

DL++K D TITWGAAQAVQPFGNILQV+++TG+ +Y LN+QYD+ Q +FLQ++GL YTY 
Sbjct: 434 DLVVKNDRTITWGAAQAVQPFGNILQVIQMTGQHIYDVLNQQYDENQTYFLQMSGLTYTY 493 
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Query: 481 TDNKEGGEETPFKVVKAYKSNGEEINPDAKYKLVINDFLFGGGDGFASFRNAKLLGA.INP 540 

TDN +TPFK+VK YK NGEEIN Y +V+NDFL+GGGDGF++F+ AKL+GAIN 

Sbjct: 494 TDITOPKNSDTPFKIVKVYKDNGEEINLTTTYTVVVNDFLYGGGDGFSAFKKAKLIGAINT 553 

Query: 541 DTEVFMAYITDLEKAGKKVSVPNNKPKIYVTMKMVNETITQNDGTHSIIKKLYLDRQGNI 600 

DTE F+ YIT+LE +GK V+ K YVT + + T + G HSII K++ +R GN 

Sbjct: 554 DTEAFI TYI TNLEASGKTVNATI KGVKNYVTSNLESSTKVNSAGKHS 1 I SKVFRNRDGNT 613 

Query: 601 VAQEIVSDTLNQTKSKSTKINPVTTIHKKQLHQFTAINPMRNYGKPSNSTTVKSKQLPKT 660 

V+ E++SD L T++ + + T +N T+ S LP T 

Sbjct: 614 VSSEVISDLLTSTENTNNSLGKKET TTNKNTISSSTLPIT 653 

Query: 661 NSEYGQSFLMSVFG-VGLIGIALNTKKK 687 

Y S +M++ + L G+ KK+ 
Sbjct: 654 GDNYKMSPIMTILALISLGGLNAFIKKR 681 

SEQ ID 6016 (GBS328) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 69 (lane 4; MW 73kDa). The GBS328-His fusion product was purified (Figure 
213, lane 9) and used to immunise mice. The resulting antiserum was used for FACS (Figure 268), which 
confirmed that the protein is immunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1941 

A DNA sequence (GBSx2050) was identified in S.agalactiae <SEQ ID 6017> which encodes the amino 
acid sequence <SEQ ID 6018>. This protein is predicted to be peptide deformylase (def-2). Analysis of this 
protein sequence reveals the following: 

Possible site: 21 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.70 Transmembrane 55 - 71 ( 55 - 74) 

Final Results 

bacterial membrane Certainty=0 . 1680 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB09662 GB:Z96934 peptide deformylase [Clostridium 
beijerinckii] 

Identities = 71/136 (52%) , Positives = 96/136 (70%) 

Query: 1 MIKPIVRDTFFLQQKSQMASRADVSLAKDLQETLHANQNYCVGMAANMIGSLKRVIIINV 60 

MIKPIV+D FL QKS+ A++ D+ + DL +TL AN +CVG+AANMIG KR+++ V 
Sbjct: 1 MIKPIVKDILFLGQKSEEATKNDrOTIDDLIOTLRANLEHCVGIAANMIGVTOOlILVFW 60 

Query: 61 GITNLVMFNPVVVAKSDPYETEESCLSLVGCRSTQRYCHITISYRDINWKEQQIKLTDFP 120 

G + M NPV++ K PYETEESCLSL+G R T+RY I ++Y D N+ +++ F 
Sbjct: 61 GNLIVPMINPVILKKEKPYETEESCLSLIGFRKTKRYETIEVTYLDRNFNKKKQVFNGFT 120 

Query: 121 AQICQHELDHLEGILI 136 

AQI QHE+DH EGI+I 
Sbjct: 121 AQI IQHEMDHFEGI I I 136 

A related DNA sequence was identified in S.pyogenes <SEQ ID 601 9> which encodes the amino acid 
sequence <SEQ ID 6020>. Analysis of this protein sequence reveals the following: 

Possible site: 45 

»> Seems to have no N-terminal signal sequence 
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INTEGRAL Likelihood '= -3.61 Transmembrane 55 - 71 ( 55 - 73) 



Certainty=0. 2444 (Affirmative) < suco 
Certainty=0.0000 (Not Clear) < suco 
Certainty=0.0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 77/136 (56%) , Positives = 103/136 (75%) 

Query: 1 MIKPITODTFFLQQKSQMASRADVSLAKDLQETLHftNQNYCVGMAANMIGSLKRVIIINV 60 

MI+ 1+ D F LQQK+Q+A + D+ + +DLQ+TL + C+GMAANMIG KR++I+++ 
Sbjct: 1 MIREIITDHFLLQQKAQVAKKEDLWIGQDLQDTLAFYRQECLGMAANMIGEQKRIVIVSM 60 

Query: 61 GITNLVMFNPWVAKSDPYETEESCLSLVGCRSTQRYCHITISYRDINWKEQQIKLTDFP 120 

G +LVMFNPV+V+K Y+T+ESCLSL G R TQRY IT+ Y D NW+ +++ LT 
Sbjct: 61 GFIDLVMFNPVMVSKKGIYQTKESCLSLSGYRKTQRYDKITVEYLDHNWRPKRLSLTGLT 120 

Query: 121 AQICQHELDHLEGILI 136 

AQICQHELDHLEGILI 
Sbjct: 121 AQICQHELDHLEGILI 136 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1942 

A DNA sequence (GBSx2051) was identified in S.agalactiae <SEQ ID 6021> which encodes the amino 
acid sequence <SEQ ID 6022>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 2880 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB05820 GB:AP001514 NADP-specif ic glutamate dehydrogenase 
[Bacillus halodurans] 
Identities = 298/444 (67%) , Positives = 362/444 (81%) , Gaps = 2/444 (0%) 

Query: 7 YVASVLEKVKKQNEHEEEFLQAVEEVFESLVPVFDKYPQYIEENLLERLVEPERVISFRV 66 

YV V E VK++N +E EF QAV+EVF+SL+PV K+PQY+++ +LER+VEPERVISFRV 
Sbjct: 16 WQHVYETVKRRNPNEHEFHQAVKEVFDSLLPVLVKHPQYVKQAILERIVEPERVISFRV 75 

Query: 67 PWVDDKGQVQVNRGYRVQFSSAIGPYKGGLRFHPTVTQSIVKFLGFEQIFKNSLTGLPIG 126 

PWVDD+G VQVNRG+RVQF+SA+GPYKGGLRFHP+V SI+KFLGFEQIFKN+LTG PIG 
Sbjct: 76 PWVDDQGNVQVNRGFRVQFNSALGPYKGGLRFHPSVNASIIKFLGFEQIFKNALTGQPIG 135 

Query: 127 GGKGGSNFDPKGKSDNEVMRFTQSFMTELQKYIGPDLDVPAGDIGVGGREIGYLYGQYKR 186 

GGKGGS+FDPKGKSD E+MRF+QS FM+EL YIGPD+DVPAGDIGVG +EIGY++GQYK+ 
Sbjct: 136 GGKGGSDFDPKGKSDGEIMRFSQSFMSELSNYIGPDIDVPAGDIGVGAKEIGYMFGQYKK 195 

Query: 187 L-NGYQNGVLTGKGLTYGGSIiARTFATGYGAvYFAKEMIAARGQDLTGKVALVSGSGNVA 245 

+ G++ GVLTGKG+ YGGSLAR EATGYG VYF +EM+ G G +VSGSGNV+ 
Sbjct: 196 ^GGFFAGVLTGKGIGYGGSIiARKEATGYGTWFVEEMIKDHGFSFAGSTVWSGSGNVS 255 

Query: 246 IYATEKLQELGATWAVSDSSGYVYDPDGIDLETLKQIKEVERARIVKYTEKHPKANFTP 305 

IYA EK +LGA WA SDS GYVYD +GIDL+T+K++KEVER RI +Y +HP A++ 
Sbjct: 256 IYA^KAMQLGMOTACSDSGGYVYDKNGIDLC™ 315 

Query: 306 ADQGS I WS I KADLAFPCATQNELDEEDAKLL VENGVIAVTEGANMPSTLGAI KVFQKAGV 365 



Final Results 

bacterial membrane 

bacterial outside 

bacterial cytoplasm 
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G IWS+ D+A PCATQNELDE A +L+ NGV AV EGANMPSTL A+ FQ+ GV 
Sbjct: 316 GCSGrlWSVPCDIALPCATQNELDEAAATMLIANGVKAVGEGANMPSTLQAVHTFQEHGV 374 

Query: 366 AFGPAKAANAGGVAVSALEMAQNSSRRAWTFEEVDQELQRIMKTIFVNASEAADEFGDSG 425 
5 F PAKAANAGG V+ VS ALEMAQNS +R AWTFEEVD +L IMK 1+ + +AA+ + SG 

Sbjct: 375 LFAPAKAANAGGVSVSALEMAQNSTRLAWTFEEVDAKLYEIMKNIYRESIKAAELYEASG 434 

Query: 426 NL VLGANIAGFLKVAQAMSAQG IV 449 
NLV+GANIAGF+KVA AM + G+V 
10 Sbjct: 435 NLWGANIAGFVTOTADAMISHGW 458 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

15 Example 1943 

A DNA sequence (GBSx2052) was identified in S.agalactiae <SEQ ID 6023> which encodes the amino 
acid sequence <SEQ ID 6024>. Analysis of this protein sequence reveals the following: 

Possible site: 58 

»> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial membrane Certainty=0. 4418 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9955> which encodes amino acid sequence <SEQ ID 9956> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

35 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1944 

A DNA sequence (GBSx2053) was identified in S.agalactiae <SEQ ID 6025> which encodes the amino 
40 acid sequence <SEQ ID 6026>. This protein is predicted to be ABC transporter, ATP-binding protein 
(msbA). Analysis of this protein sequence reveals the following: 

Possible site: 37 

»> Seems to have a cleavable N-term signal seq. 
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50 



Certainty=0. 5288 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
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bacterial cytoplasm Certainty=0 . 0000 (Not Clear)' < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB69752 GB:AL137187 putative ABC transporter [Streptomyces coelicolor A3 (2)] 
Identities = 269/611 (44%) , Positives = 392/611 (64%), Gaps = 31/611 (5%) 

Query: 9 RLWS YLTRYKATLFLAI FLKVLS SFMS I LEPFILGLAITELTANLV- - DMAKG -- 59 

RL S +ATLF + V+S ++++ P ILG A + A +V DM G 

Sbjct: 27 RLVSQFRPERATLFTLIACVWSVGLNVVGPKILGRATDLVFAGIVGRDMPSGATKEQVL 86 

Query: 60 VSGAELNVPYIAGILIIYFFRGVFYELGSYGSNYFMTTW 99 

V G ++ + +L++ L + + V 

Sbjct: 87 ATMREHGDGWADMLRSTDFVPGQGIDFGAVGEVIiLLAIATFAVAGLLMAVATRLvNRAV 146 

15 Query: 100 QKSIRDIRHDLNRKINKVPVSYFDKHQFGDMLGRFTSDVETVSNALQQSFLQIINAFLSI 159 

+++ +R D+ K++++P+SYFDK Q G++L R T+D++ + LQQS Q+IN+ L+I 
Sbjct: 147 NRTMFRLREDVQTKLSRLPLSYFDKRQRGEVLSRATNDIDNIGQTLQQSMGQLINSLLTI 206 

Query: 160, ILVVAMVLYLIWPLAMIIIACIPVTYFSAQAILKRSQPYFKEQAKILGELNGFVQEKLTG 219 
20 I V+ M+ Y++ LA++ + +P+++ A + KRSQP F +Q + G+LM ++E TG 

Sbjct: 207 IGVXAMMFWSWILALVALVTVPLSFWATRVGKRSQPQFVQQWRSTGQIMAHIEEMYTG 266 

Query: 220 FNIIKLYGREEASSQEFRDITDNLRHVGFKASFISGIMMPVLNSISDFIYLIIAFVGGLQ 279 
++K++GR+E S+++F + D L GFKA F SGIM P++ +S+ Y+++A VGGL+ 
25 Sbjct: 267 HALVWFGRQEESAKQFAEQNDALYEAGFKAQFNSGIMQPLMMCTSNLN^ 326 

Query: 280 VIAGTLTIGNMQAFVQYVWQISQPVQTITQLAGVLQSAKSSLERIFEVLD-EEEEANQVT 338 

V +G L+IG++QAF+QY Q S P+ + +A ++QS +S ER+FE+LD EE+ A+ + 
Sbjct: 327 VASGQLSIGDVQAFIQYSRQFSMPLTQvASMANLVQSGVASAERVFELLDAEEQSADPIP 386 

30 

Query:, 339 EKLSHDLTGQVSFHGVIJFHYSPDKPLIRDFNLDVEPGQMIAIVGPTGAGKTTLINLLMRF 398 

DL G+V V F Y P+KPLI D +L VEPG +AIVGPTGAGKTTL+NLLMRF 
Sbjct: 387 GARPEDLRGRVELEHVSFRYDPEKPLIEDLSLKVEPGHTVAIVGPTGAGKTTLVNLLMRF 446 

35 Query: 399 YDVSEGAITVDGHDIRHLSRQDFRQQFGMVLQDAWLYEGTIKENLRFG-NLEASDEDIVA 457 

Y+VS G IT+DG DI +SR + R GMVLQD WL+ GTI EN+ +G + E + +1 
Sbjct: 447 YEVSGGRITLDGVDIAKMSRDELRAGIGMvliQDTWLFGGTIAENIAYGASREVTRGEIEE 506 

Query: 458 AAKAANVDHF IRTLPGGYNMVMNQESSNI SLGQKQLLTIARALLADPKI L I LDEATSS VD 517 
40 AA+AA+ D F+RTLP GY+ V++ E + +S G+KQL+TIARA L+DP IL+LDEATSSVD 

Sbjct: 507 AARAAHADRFVRTLPDGYDTVIDDEGTGVSAGEKQLITIARAFLSDPVILVLDEATSSVD 566 

Query: 518 TRLELL I QKAMKKLMEGRTSFVIAHRLST IQEADN I LVLKDGQI I EQGNHQKLLADKGFY 577 
TR E+LIQKAM KL GRTSFVIAHRLSTI++AD ILV++DG I+EQG H +LL G Y 
45 Sbjct: 567 TRTEV1IQKAMAKLAHGRTSFVIAHRLSTIRDADTILVMEDGAIVEQGAHTELLTADGAY 626 

Query: 578 YELYNSQFSNS 588 

LY +QF+ + 
Sbjct: 627 ARLYKAQFAEA 637 

50 

There is also homology to SEQ IDs 160 and 6546. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1945 

55 A DNA sequence (GBSx2054) was identified in S.agalactiae <SEQ ID 6027> which encodes the amino 
acid sequence <SEQ ID 6028>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-10.88 Transmembrane 242 - 258 ( 235 - 263) 
60 INTEGRAL Likelihood = -9.82 Transmembrane 159 - 175 ( 129 - 177) 

INTEGRAL Likelihood = -9.71 Transmembrane 52 - 68 ( 49 - 77) 
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INTEGRAL Likelihood = -8.49 Transmembrane 134 - 150 ( 129 - 158) 
INTEGRAL Likelihood = -1.17 Transmembrane 272-- 288 ( 272 - 289) 

-- Final Results 

bacterial membrane Certainty=0 . 5352 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB69751 GB:AL137187 putative ABC transporter [Streptomyces 
coelicolor A3 (2) ] 

Identities = 226/565 (40%) , Positives = 342/565 (60%) , Gaps = 1/565 (0%) 

Query: 6 SYLKRYPNWLWLDLLGAMLFVTVILGMPTAIAGMIDNGVTKGDRTGVYLWTFIMFIFAA/L 65 

+YL+ Y + L + L L +PT A +ID GV KGD + + +M + 

Sbjct: 8 TYLRPYKKPIALLVALQFLQTCASLYLPTLNAHIIDEGWKGDSGYILSYGALMIGISLA 67 

Query: 66 GIIGRITMAYASSRLTTTMIRDMRNDMYAKLQEYSHHEYEQIGVSSLVTRMTSDTFVLMQ 125 

++ I + +R + RD+R ++ ++Q +S E G SL+TR T+D + 
Sbjct: 68 QWCNIGAVFYGARTAAALGRDVRGAVFDRVQSFSAREVGHFGAPSLITRTTNDVQQVQM 127 

Query: 126 FAEMSLRLGLVTPMVMIFSWMILITSPSLAWLVAVAMPLLVGVILYVAIKTKPLSERQQ 185 

A M+ L + P++ + +VM L L+ ++ +P+L + + K +PL + Q 

Sbjct: 128 IALMTFTLWSAPIMCVGGIVMALGLDVPLSGVLLGWPvXAlCVTLIWKLRPLFRKMQ 187 

Query: 186 TMLDKINQYVRENLTGLRWRAFARENFQSQKFQVANQRYTDTSTGLFKLTGLTEPLFVQ 245 

LD +N+ +RE +TG RV+RAF R+ ++ Q+F+ AN T+ + G L L P+ + 
Sbjct: 188 TOLDTVNRVLREQITGNRVIFAFWDEYEQQRFRKAOT^ 247 

Query: 246 1 1 IAMIVAIWFALDPLQRGAIKIGDLVAFIEYSFHALFSFLLFANLFTMYPRMWSSHR 305 

++ +A+VWF + G ++IGDL AF+ Y + S ++ +F M PR V + R 
Sbjct: 248 VVNLSSIAvWFGAHRIDSGGMQIGDLTAFIAYLMQIVMSVMM 307 

Query: 306 IREVMDMPISINPNTEGVTDTKLKGHLEFDNVTFAYPGETESPVLHDISFKAKPGETIAF 365 

I+EV++ S+ P VT+ + GHLE F YPG E PVL I A+PGET A 

Sbjct: 308 IQEVLETESSWPPVAPVTELRRHGHLE1REAGFRYPG-AEEPVLRHIDLVARPGETTAV 366 

Query: 366 IGSTGSGKSSLVNLIPRFYDVTLGKILVDGVDVRDYNLKSLRQKIGFIPQKALLFTGTIG 425 

IGSTGSGKS+L+ L+PR +D T G++LV+GVDVR + K+L + + +PQK LF GT+ 
Sbjct: 367 IGSTGSGKSTLLGLVPRLFDATDGEVLWGVDTOTVDPKTLAKVVSLVPQKPYLFAGTVA 426' 

Query: 426 ENLKYGKADAT I DDLRQA VD I SQAKE FIESHQEAFETHLAEGGSNLSGGQKQRLS I ARAV 485 

NL+YG DAT ++L A+ ++QAKEF+ + + +A+GG+N+SGGQ+QRL+IAR + 
Sbjct: 427 TNLRYGNPDATDEELWHALAVAQAKEFVSELEGGLDAPIAQGGTNVSGGQRQRLAIARTL 486 

Query: 486 VKDPDLYIFDDSFSALDYKTDATLRARLKEVTGDSTVLIVAQRVGTIMDADQIIVLDEGE 545 

V+ P++Y+FDDSFSALDY TDA LRA L + T ++TV+IVAQRV TI DAD+I+VLDEG 
Sbjct: 487 VQRPEIYLFDDSFSALDYATDAALRAELAQETAEATWIVAQRVATIRDADRIWLDEGR 546 

Query: 546 IVGRGTHAQLIENNAIYREIAESQL 570 

+VG G H +L+ +N YREI SQL 
Sbjct: 547 WGVGRHHELMADNETYRE I VLSQL 571 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4985> which encodes the amino acid 
sequence <SEQ ID 4986>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

»> Seems to have an uncleavable N-term signal seq 
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INTEGRAL 
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bacterial membrane Certainty=0 . 7496 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

5 An alignment of the GAS and GBS proteins is shown below. 

Identities = 175/511 (34%) , Positives = 296/511 (57%) , Gaps = 3/511 (0%) 

Query: 59 MFIFWLGIIGRITMAYASSRLTTTMIRDMRNDMYAKLQEYSHHEYEQIGVSSLVTRMTS 118 
+ I +LG++ ++++ + DMR + K+Q++S+ E +LV R+T+ 

10 Sbjct: 56 LLIIALLGLMSGAINTVLAAKIAQGVSADMREKTFRKIQDFSYANIEAFNAGNLVVRLTN 115 

Query: 119 DTFVLMQFAEMSLRLGLOTPMVMIFSVVMILITSPSLAWLVAVAMPLLVGVILYVAIKTK 178 

D + M ++ P++ I + +M + T P L W++ V + L+ ++ V + 

Sbjct: 116 DINQIQSLVMMMFQILFRLPILFIGAFIMAVQTFPQLWWVIVVMVILIALIMGLVMRQMG 175 

15 

Query: 179 PLSERQQTMLDKINQYVRENLTGLRWRAFARENFQSQKFQVANQRYTDTSTGLFKLTGL 238 

P + Q ++DKIN+ +ENL G+RW++F +E Q KF+ + + + L 

Sbjct: 176 PRFGKFQRLMDKINRIAKENLRGVRWKSFVQEQQQYTKFKETSNDLLALNLSIGYGFSL 235 

20 Query: 239 TEPLFVQIIIAMIVAIVWFALDPLQRGAIKIGDLVAFIEYSFHALFSFLLFANLFTMYPR 298 

+P + + + + ++ IG++ +F+ Y +FS ++ ++ R 

Sbjct: 236 MQPALMLVSYIAVYVS INWSTMVETDPTVIGNIASFMTYMMQIMFS 1 1 WGSMGMQVSR 295 

Query: 299 MWSSHRIREvMDMPISINPNTEGVTDTKLKGHLEFDNVTFAYPGETESPVLHDISFKAK 358 
25 VS RIR+++ ++ E + + G + FD+V+F YP + E P L ISF + 

Sbjct: 296 AFVSMARIRQILSTEPAMTFENE- - KEET I SGS I VFDDVSFTYPNDDE - PTLKHI SFAIE 352 

Query: 359 PGETIAFIGSTGSGKSSLVNLIPRFYDVTLGKILVDGVDVRDYNLKSLRQKIGFIPQKAL 418 
PG+ + +G+TGSGKS+L LIPR +D G+IL+ G ++ + +LRQ + + QKA+ 
30 Sbjct: 353 PGQMVGIVGATGSGKSTLAQLIPRLFDPQDGQILLGGKPIKTLSQTTLRQSVSIVLQKAI 412 

Query: 419 LFTGTIGENLKYGKADATIDDLRQAVDISOAKEFIESHQEAFETHLAEGGSNLSGGQKQR 478 

LF+GTI +NL+ G A A ID +++A I+QAKEFI+ +E+ + E GSNLSGGQKQR 

Sbjct: 413 LFSGTIADNLRQGSAKADIDAMQKAAQIAQAKEFIDRMDSRYESQVEERGSNLSGGQRQR 472 

35 

Query: 479 LSIARAVViajPDLYIFDDSFSAIiDYKTDATLRARLKEVTGDSTVLIVAQRVGTIMDADQI 538 

LSIAR V+ P + I DDS SALD K++ ++ L +T +IVAQ++ +++ AD+I 

Sbjct: 473 LSIARGVINHPKILILDDSTSALDAKSEKRVQEALSHKLEGTTTVIVAQKISSWKADKI 532 

40 Query: 539 IVLDEGEIVGRGTHAQLIENNAIYREIAESQ 569 

+VLD+G+++G GTHA+L+ NNAIYREI E+Q 
Sbjct: 533 LVLDQGQLIGEGTHAELVANNAIYREIYETQ 563 

There is also homology to SEQ IDs 72 and 6552. 

45 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1946 

A DNA sequence (GBSx2055) was identified in S.agalactiae <SEQ ID 6029> which encodes the amino 
acid sequence <SEQ ID 6030>. Analysis of this protein sequence reveals the following: 

50 Possible site: 24 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2391 (Affirmative) < suco 

55 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA51784 GB:X73368 ORF 18.3 [Salmonella typhimurium] 
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• Identities = 58/162 (35%),' Positives = 92/162 (55%), Gaps = 8/162 (4%). 

Query: 1 MIIRPIIKNDDQAVAQLIRQSLRAYDL--DKPDTAYSDPHLDHLTSYYEKIEKSGFFVIE 58 

+ +R I D+ A+A++IRQ Y L DK T +DP+LD L ' Y + + ++V+E 
Sbjct: 9 LTTORITTADNAAIARVIRQVSAEYGLTADKGYTV-ADPNLDELYQVYSQ-PGAAYWVVE 66 

Query: 59 ERDEIIGCGGFGPLKNL IAEMQKVYIAERFRGKGLATDLVKMIEVEARKIGYRQLYL 115 

+ ++G GG PL I E+QK+Y RG+GLA L M AR+ G+++ YL 

Sbjct: 67 QNGCWGGGGVAPLSCSEPDICELQKMYFLPVIRGQGIiAKKIjyijMALDHAREQGFKRCYL 126 

Query: 116 ETASTLSRATAVYKHMGYCALSQPIANDQGHTAMDIWMIKDL 157 

ET + L A A+Y+ +G+ +S+P+ GH ++ M+KDL 
Sbjct: 127 ETTAFLREAIALYERLGFEHISEPL-GCTGHVDCEVRMLKDL 167 

15 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1947 

A DNA sequence (GBSx2056) was identified in S.agalactiae <SEQ ID 603 1> which encodes the amino 
20 acid sequence <SEQ ID 6032>. This protein is predicted to be ABC transporter. Analysis of this protein 
sequence reveals the following: 

Possible site: 25 

>>> Seems to have no N-terminal signal sequence 

25 : Final Results 

bacterial cytoplasm Certainty=0 . 1738 (Affirmative) < suco 

bacterial membrane — Certainty=0.0000 (Not Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 

30 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12566 GB:Z99108 similar to ABC transporter (ATP-binding 
protein) [Bacillus subtilis] 
Identities = 269/625 (43%) , Positives = 397/625 (63%) , Gaps = 11/625 (1%) 

35 Query: 1 MSDFL VDGLTKSVGDKTVFSNVSFI IHSLDRIGI IGVNGTGKTTLLDVI SGELGFDGDRS 60 

MS + L K+ GDKT+F ++SF I +RIG+IG NGTGK+TLL VI+G + 
MSILKAENLYKTYGDKTLFDHISFHIEENERIGLIGPNGTGKSTLLKVIAGLESIE--EG 58 

PFSSANDYKIAYLKQEPDFDDSQTILDTVLSSDLREMALIKEYELLLNirf EESKQ 115 

40 + + ++ +L Q+P+ QT+L+ + S + M ++EYE L E +Q 



45 



50 



55 



Query: 


1 


Sb j ct : 


1 


Query: 


61 


Sb j ct : 


59 


Query: 


116 


Sb j ct : 


119 


Query: 


176 


Sb j ct : 


179 


Query: 


236 


Sb j ct : 


239 


Query: 


296 


Sb j ct : 


299 


Query: 


356 



A+MD+ +AW + KTVLSKLG+ D+ V ELSGG ++RV +A+ L+ AD 



LL+LDEPTNHLD +TI WL +L V+ +THDRYFL+ V RI+EL++ + Y+G 



NY+ ++ RAE++ + K++ L ++ELAW+R +AR+TKQ+ARI+R + LK 



S S L+ + R+GK+VI ENV +Y + ++ FN L+ +RIGI+G NG+GK 



60 +TLLN + PD G+++IG+T+R+GY++Q M+G +VI+Y++E A+ VKT+ G 
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Sbjct: 358 TTLmALAGRHTPDGGDITIGQTVRIGYYTQDHSEMNGELKVIDYIKETAEVVKTADGDM 41V 

Query: 416 SVTE-LLEQFLFPRSTHGTQIAKLSGGEKKRLYLLKILIEKPNVLLLDEPTNDLDIATLT 474 

E +LE+FLFPRS T I KLSGGEK+RLYLL++L+++PNVL LDEPTNDLD TL+ 
Sbjct: 418 ITAEQMLERFLFPRSMQQTYIRKLSGGEKRRLYLLQVLMQEPNVLFLDEPTNDLDTETLS 477 

Query: 475 VLENFLQGFGGPVITVSHDRYFLDKA/ANKIIAE^HD-IREFFGNYTDYLDEKAFNEQNN 533 

VLE+++ F G VITVSHDRYFLD+V +++I FE N I F G+Y+DY++E + 
Sbjct: 478 VLEDYIDQFPGWITVSHDRYFLDRWDRLIVFEGNGVISRFQGSYSDYMEESKAKKAAP 537 

Query: 534 EVISKKESTKTSREKQSRKRMSYFEKQEWATIEDDIMILENTITRIENDMQTCGSDFTRL 593 

+ + +E T + K+ RK++SY ++ EW IED I LE ++E D+ GSDF ++ 
Sbjct: 538 KP-AAEEKTAEAEPKKKRKKLSYKDQLEWDGIEDKIAQLEEKHEQLEADIAAAGSDFGKI 596 

15 Query: 594 SDLQKELDAKNEALLEKYDRYEYLS 618 

+L _E E L DR+ LS 

Sbjct: 597 QELMAEQAKTAEELEAAMDRWTELS 621 

A related DNA sequence was identified in S. pyogenes <SEQ ID 603 3> which encodes the amino acid 
20 sequence <SEQ ID 6034>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

>>> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm Certainty=0 . 2591 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown helow. 

30 Identities = 467/624 (74%) , Positives = 535/624 (84%) , Gaps = 3/624 (0%) 

Query: 1 MSDFLVDGLTKSVGDKTVFSNVSFIIHSLDRIGIIGVNGTGKTTLLDVISGELGFDGDRS 60 

MS FLV+ LTK+VGDKTVF ++SFIIH DRIGIIGVNGTGKTTLLDV+SG LGFDGD S 
Sbjct: 1 MSHFLVEKLTKTVGDKTVFQDISFIIHDFDRIGIIGVNGTGKTTLLDVLSGRLGFDGDHS 60 

35 

Query: 61 PFSSANDYKIAYLKQEPDFDDSQTILDTVLSSDLREMALIKEYELLLNHYEESKQSRLEK 120 

PFS ANDYKIAYL Q+P+F+D+ ++LDTVLS+D++ + LI++YELL+ +Y E KQ LE 
Sbjct: 61 PFSKANDYKIAYLTQDPEETOAASVLDTVLSADVKAIQLIRQYELLMANYTEDKQESLES 120 

40 Query: 121 VMAEMDSLDAWSIESEVKTVIjSKLGITDLQLSVGELSGGLRRRVQmQvIjLNDADLLLLD 180 

+M+EMD LDAWSIES+VKTVLSKLGITDL+ VG+LSGG+RRRVQLAQVLL ADLLLLD 
Sbjct: 121 LMSEMDRLDAWSIESDVKTVLSKLGITDLEQKVGDLSGGMRRRVQLAQVLLGAADLLLLD 180 

Query: 181 EPTNHLDIDTIAWLTNFLKNSKKTVLFITHDRYFLDNVATRIFELDKAQITEYQGNYQDY 240 
45 EPTNHLDIDTIAWLT +LK + KKTVLFITHDRYFLD+VATRI FELDKA +TEYQGNYQDY 

Sbjct: 181 EPTNHLDIDTIAWLTTYLKTAKKTVLFITHDRYFIjDHVATRIFELDKAGLTEYQGNYQDY 240 

Query: 241 TOLRAEQDERDAASLHKKKQLYKQELAWMRTQPQARATKQQARINRFQNLKNDLHQTSDT 300 
VRL+AEQDERDAA+LHKKKQLYKQELAWMRTQPQARATKQQARINRF +LK ++HQ S 
50 Sbjct: 241 TOLKAEQDERDAANLHKKKQLYKQELAWMRTQPQARATKQQARINRFSDLKKEVHQDSSA 300 

Query: 301 SDLEMTFETSRIGKKVINFENVSFSYPDKSILKDFNLLIQNKDRIGIVGDNGVGKSTLLN 360 

LEMTFETSRIGKKVI+FE++SF+Y D+ ++KDFNL+IQNKDRIGIVGDNGVGKSTLLN 
Sbjct: 301 DKLEMTFETSRIGKKVIHFEDLSFAYGDRQLIKDFNLIIQNKDRIGIVGDNGVGKSTLLN 360 

55 

Query: 361 LIVQDLQPDSGNVSIGETIRVGYFSQQLHNMDGSKRVINYLQEVADEVKTSVGTTSVTEL 420 

+1 DL+P SG + IG+TIRVGYFSQQL +MD +KRVINYLQEVADEVKTSVGTTS++EL 
Sbjct: 361 IINGDLKPTSGKLDIGDTIRVGYFSQQLKDMDETKRVINYLQEVADEVKTSVGTTSISEL 420 

60 Query: 421 LEQFLFPRSTHGTQIAKLSGGEKKHLYLLKILIEKPNVIiLLDEPTNDLDIATLTVLENFL 480 

LEQFLFPRS+HGT IAKLSGGEKKRLYLLK+LIEKPNVLLLDEPTNDLDIATL VLENFL 
Sbjct: 421 LEQFLFPRSSHGTLIAKLSGGEKKRLYLLKLLIEKPNVLLLDEPTNDLDIATLKVLENFL 480 

Query: 481 QGFGGPVITVSHDRYFLDKVANKIIAFEDNDIREFFGNYTDYLDEKAFNEQNNEVISKKE 540 
65 F GPVITVSHDRYFLDKVA KI+AFE+ DIR F+GNY+DYLDEK F ++ E K 
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Sbjct: 481 ANFAGPVITVSHDRYFLDKVATKIIAFEKGDIRVFYGireSDyLDEKVFEKETVEADLAKT 540 

Query: 541 STKTS REKQSRKRMSYFEKQEWATIEDDIMILENTITRIENDMQTCGSDFTRLSDLQ 597 

+ +K+ RKRMSY EKQEWA I ED I +E I I EM M T SD+ +L+ LQ 

Sbjct: 541 TVTEEVPLPQKEERKRMSYLEKQEWAQIEDKIATIEANIEEIENQMLTWSDYGQIiAQLQ 600 

Query: 598 KELDAKNEALLEKYDRYEYLSELD 621 

ICELD +N LL Y+R+EYLS LD 
Sbjct: 601 KELDQRNNDLLLAYERFEYLSGLD 624 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1948 

A DNA sequence (GBSx2057) was identified in S.agalactiae <SEQ ID 6035> which encodes the amino 
acid sequence <SEQ ID 6036>. This protein is predicted to be poly(a) polymerase (papS). Analysis of this 
protein sequence reveals the following: 

Possible site: 14 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2658 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9957> which encodes amino acid sequence <SEQ ID 995 8> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB38446 GB:L47709 poly(A) polymerase [Bacillus subtilis] 
Identities = 157/395 (39%) , Positives = 235/395 (58%) , Gaps = 14/395 (3%) 



Query: 


11 


FQKALPILKKIKKAGYEAYFVGGSVRDVLLDRPIHDVDIATSSYPEETKQIFKRTVDVGI 


70 






F KALP+L+ + +AG++AYFVGG+VRD + R I DVDIAT + P++ +++F+RTVDVG 




Sb j ct : 


5 


FIKALPVLRILIFAGHQAYFVGGAVRDSYMKRTIGDVDIATDAAPDQVERLFQRTVDVGK 


64 


Query: 


71 


EHGTVLVLEKGGEYEITTFRTEEVYVDYRRPSQWFWSLEEDLKRRDFTVNAFAIjNEDG 


130 






EHGT++VL + YE+TTFRTE YVD+RRPS+V F+ SLEEDLKRRD T+NA A+ DG 




Sb j ct : 


65 


EHGTIIVLWEDETYEVTTFRTESDYVDFRRPSEVQFISSLEEDLKRRDLTINAMAMTADG 


124 


Query: 


131 


EVIDLFHGLDDLDNHLLRAVGLASERFNEDALRIMRGLRFSASLNFDIETTTFEAMKKHA 


190 






+V+D F G D+D ++R VG +RF EDALR++R +RF + L F + T EA+ K 




Sbjct: 


125 


KVLDYFGGKKDIDQKVIRTVGKPEDRFQEDALRMLRAVRFMSQLGFTLSPETEEAIAKEK 


184 


Query: 


191 


SLLEKISVERSFIEFDKLLLAPYWRKGMLALIDSHAFNYLPCLKNRELQLSAFLSQLDKD 


250 






SLL +SVER IEF+KLL R+ + LI + + LP ++ L +S + 




Sbjct: 


185 


SLLSHVSVERKTIEFEKLLQGRASRQALQTLIQTRLYEELPGFYHKRENL- - -ISTSEFP 


241 


Query: 


251 


FLFETS -EQAWASLILSMEV- -EHTKTFLKKWKTSTHFQKDVEHIVDVYRIREQMGLTKE 


307 






F TS E+ WA+L++++ + + FLK WK K+ HI D + L 




Sbjct: 


242 


FFSLTSREELWAALLINLGIVLKDAPLFLKAWKLPGKVIKEAIHIADTF GQSLDAM 


297 


Query: 


308 


HLYRYGKTIIKQAEGIRKAR-GLMVDFEKIEQLD SELAIHDRHEI VVNGGTLIKKLG 


363 






+YR GK + A I + R +D +K++ + LI ++ + G L+ 




Sbjct: 


298 


TMYRAGKKALLSAAKISQLRQNEKLDEKKLKDIQYAYQNLPIKSLKDLDITGKDLLALRN 


357 


Query: 


364 


IKPGPQMGDIISQIELAIVLGQLINEEEAILHFVK 398 








G + + + IE A+V G+L N+++ I ++K 




Sbjct: 


358 


RPAGKWVSEELQWIEQAWTGKLSNQKKHIEEWLK 392 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 6037> which encodes the amino acid 
sequence <SEQ ID 6038>. Analysis of this protein sequence reveals the following: 

Possible site: 13 ' , 

»> Seems to have no N-terminal signal sequence . - 



Final Results 

bacterial cytoplasm Certainty=0. 2023 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 256/400 (64%) , Positives = 312/400 (78%) 

Query: 2 MRLNYLPSEFQKALPILKKIKKAGYEAYFVGGSvRDVLLDRPIHDvDIATSSYPEETKQI 61 
15 M+L +PSEFQKALPIL KIK+AGYEAYFVGGSVRDVLL+RPIHDVDIATSSYPEETK I 

Sbjct: 1 MKLMTMPSEFQKALPILTKIKEAGYEAYFVGGSVRDVLLERPIHDVDIATSSYPEETKAI 60 

Query: 62 FKRTvDVGIEHGTVLVLEKGGEYEITTFRTEEVYVDYRRPSQVNFVRSLEEDLKRRDFTV 121 
F RTVDVGIEHGTVLVLE GGEYEITTFRTE++YVDYRRPSQV+FVRSLEEDLKRRDFTV 
20 Sbjct: 61 FNRTVDVGIEHGTVLVLENGGEYEITTFRTEDIYVDYRRPSQVSFVRSLEEDLKRRDFTV 120 

Query: 122 NAFALNEDGEVIDLFHGLDDLDNHLLRAVGIASERFNEDALRIMRGLRFSASIiNFDIETT 181 

NA AL+E+G+VID F GL DL LRAVG A ERF EDALRIMRG RF+ASL+FDIE 
Sbjct: 121 NALALDENGQVIDKFRGLIDLKQKRLRAVGKAEERFEEDALRIMRGFRFAASLDFDIEAI 180 

25 

Query: 182 TFEAMKKHASLLEKISVERSFIEFDKLLIAPYWRKGMLALIDSHAFNYLPCLKNRELQLS 241 

TFEAM+ H+ LLEKISVERSF EFDKLL+AP+WRKG+ A+I A++YLP LK +E L+ 
Sbjct: 181 TFEAMRSHSPLLEKISVERSFTEFDKLLMAPHWRKGISAMIACQAYDYLPGLKQQEAGLN 240 

30 Query: 242 AFLSQLDKDFLFETSEQAWASLILSMEVEHTKTFLKKWKTSTHFQKDVEHIVDVYRIREQ 301 

+ L +F F QAWA +++S+ +E K+FLK WKTS FQ+ V ++ +YRIR++ 
Sbjct: 241 HLIVSLKDNFTFSDYHQAWAYVMISIAIEDPKSFLKAWKTSNDFQRYVTKLIALYRIRQE 300 

Query: 302 MGLTKEHLYRYGKTIIKQAEGIRKARGLMVDFEKIEQLDSELAIHDRHEIWNGGTLIKK 361 
35 K +Y+YGK + E +RKA+ L VD + + I LD L IHD+H+IV+NG LIK 

Sbjct: 301 RSFEKLDIYQYGKKMASLVEDLRKAQSLSVDMDRINTLDQALVIHDKHDIVLNGSHLIKD 360 

Query: 362 LGIKPGPQMGDIISQIELAIVLGQLINEEEAILHFVKQYL 401 
G+K GPQ+G ++ ++ELAIV G+L N+ I FV++ L 
40 Sbjct: 361 FGMKSGPQLGLMLEKVELAIVEGRLDNDFTTIEAFVREEL 400 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1949 

45 A DNA sequence (GBSx2058) was identified in S.agalactiae <SEQ ID 6039> which encodes the amino 
acid sequence <SEQ ID 6040>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

»> Seems to have no N-terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty=0. 2939 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

55 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB07346 GB:AP001519 unknown conserved protein [Bacillus halodurans] 
Identities = 94/274 (34%) , Positives = 153/274 (55%) , Gaps = 2/274 (0%) 



Query: 2 KLALITDTSAYLPEAIENHEDVYVLDI PI I IDGKTYIEGQNLTLDQYYDKLAASKELPKT 61 
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15 



40 



45 



Sbjct: 


3 • 


Query: 


62 


Sbjct: 


63 


Query: 


122 


Sb j ct : 


123 


Query: 


182 


Sb j ct : 


183 


Query: 


241 


Sbjct: 


242 
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K+A++TD++AYL V V+ + ++ + Y E L+ +Y+KL ++LP T 

KIAIVTDSTAYLGPKRAKELGVIWPLSVVFGEEAYQEEVELSSADFYEKLKHEEKLPTT 62 



SQP++ + +L KEG+ V+ + +++ ISG +Q+ + + D+ 1+ 



PQ N V A +EG D I++ + ++ W+DL+HL +GGRL+ ++G 



+LL IKP+LHF E+G IV EKVRTEKKA R+ E+ +E ++ +IH+ D A 



E+L + + D+ I FG VI THLGEG++ 

EKLADEIRSQFSHVDVS I SHFGPVIGTHLGEGSI 275 

20 A related DNA sequence was identified in S.pyogenes <SEQ ID 604 1> which encodes the amino acid 
sequence <SEQ ID 6042>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

>>> Seems to have no N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm Certainty=0. 3379 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) suco 

bacterial outside Certainty=0 . 0000 (Not Clear) suco 

30 An alignment of the GAS and GBS proteins is shown below. 

Identities = 181/281 (64%) , Positives = 233/281 (82%) 

MKIALITDTSAYLPEAIENHEDVYVLDIPIIIDGK3YIEGQNLTLDQYYDKIAASKELPK 60 
MKLA+ITD++A LP ++ + ++ LDIP+IID +TY EG+NL++D +Y K+A S+ LPK 
35 Sbjct: 1 MKLAVITDSTATLPTDLKQDKAIFSLDIPVIIDDETYFEGRNLSIDDFYQKMADSQNLPK 60 

TSQPSLAELDDLLCQLEKEGYTHVLGLFIAAGISGFWQNIQFLIEEHPNLTIAFPDTKIT 12 0 
TSQPSL+ELD+LL L +GYTHV+GLF+A GISGFWQNIQFL EEHP + +AFPD+KIT 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sb j ct : 


61 


Query: 


121 


Sb j ct : 


121 


Query: 


181 


Sb j ct : 


181 


Query: 


241 


Sb j ct : 


241 



SAP G++V+N L SR+GM F I+NK+Q QI+ FI+V+DLNHLVKGGRLSNGSA++ 



GNLLSIKP+L F+EEGKIWYEKVRTEKKR+KRL EI+ ++ ADG+Y++ IIHS+AQDKA 



50 + L LL +G + D+E V FG VIATHLGEGA+AFG+TP+ 

Sbjct: 241 DYLKRLLQDSGYQYDIEEVHFGAVIATHLGEGAIAFGVTPR 281 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

55 Example 1950 

A DNA sequence (GBSx2059) was identified in S.agalactiae <SEQ ID 6043> which encodes the amino 
acid sequence <SEQ ID 6044>. Analysis of this protein sequence reveals the following: 

Possible site: 46 

»> Seems to have no N-terminal signal sequence 
60 INTEGRAL Likelihood = -1.59 Transmembrane 51 - 67 ( 50 - 67) 
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Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



Certainty=0. 1638 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 



The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6045> which encodes the amino acid 
sequence <SEQ ID 6046>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.19 Transmembrane 50- 66 ( 49 - 67) 



The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below. 

Identities = 94/126 (74%) , Positives = 115/126 (90%) 

Query: 1 MEVIREQEFVNQYHYDARNLEWEEENGTPKINFEvTFQLANRDEAAKATTSIVAVLQFVIV 60 

M+++RE+EFVNQYHYDARNLEWE+ENGTP+TNFEVTFQL ++DE K T IV+VLQFVIV 
Sbjct: 1 MQLVREKEFVNQYHYDARNLEWEKENGTPETNFEVTFQLIDKDEQQKETVIVSVLQFVIV 60 

Query: 61 RDEFVISGVISQMAHIQGRLINEPSEFSQDEVENLAAPLLEIVKRLTYEVTE1ALDRPGV 120 

++EFVISGVISQM I RL+++PSEF+Q+EVE+LAAPLL++VKRLTYETCEIALDRPG+ 
Sbjct: 61 KEEFVISGVISQWVRILDRLvDKPSEFTQEEVESLAAPLLD^ 120 

Query: 121 TLEFNS 126 
LEF + 

Sbjct: 121 HLEFKN 126 

SEQ ID 6044 (GBS416) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 79 (lane 4; MW 17.5kDa). 

GBS416-His was purified as shown in Figure 214, lane 7. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1951 

A DNA sequence (GBSx2060) was identified in S.agalactiae <SEQ ID 6047> which encodes the amino 
acid sequence <SEQ ID 6048>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

>>> Seems to have no N-terminal signal sequence 



Final Results 



bacterial membrane Certainty=0. 2275 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 



Final Results 



bacterial cytoplasm Certainty=0. 3875 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1952 

A DNA sequence (GBSx2061) was identified in S.agalactiae <SEQ ID 6049> which encodes the amino 
acid sequence <SEQ ID 6050>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

»> Seems to have an uncleavable N-term signal seg 

Final Results 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1953 

A DNA sequence (GBSx2062) was identified in S.agalactiae <SEQ ID 6051> which encodes the amino 
acid sequence <SEQ ID 6052>. This protein is predicted to be PTS system, fructose-specific enzyme II, 
BC component (fruA-1). Analysis of this protein sequence reveals the following: 

Possible site: 23 

>>> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial membrane --- Certainty=0 . 5225 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9959> which encodes amino acid sequence <SEQ ID 9960> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04547 GB.-AP001510 PTS system, fructose-specific enzyme II, BC 
component [Bacillus halodurans] 
Identities = 320/659 (48%), Positives = 438/659 (65%), Gaps = 46/659 (6%) 

Query: 1 MKIQDLLKKEvMIMDLKATSKEAAIDEMITKLVDTGVVTNFAIFKDGIMKREAQTSTGLG 60 

+KI +LLKK+ M+++L+A SKEA IDE++ L G + + FK I++RE+Q++TG+G 
Sbjct: 2 LKISELLKKDTMVIjNLRAASKEAVIDELTOTIiDKAGRIjNDAOAFKRAILERESQSTTGVG 61 

Query: 61 DGIAMPHSKNAAVKEATVLFAKSASGVDYEALK3QPTDLFFMIAAPDGANDTHLAALAEL 120 

+GIA+PH+K AAVK+ + F +S +G+DYE+LDGQP+ LFFMIAA +GAN+ HL L+ L 
Sbjct: 62 EGIAIPHAKTAAVKQPAIAFGRSDAGIDYESLDGQPSHLFFMIAASEGANNEHLETLSRL 121 
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Query: 121 SKYLLKEGFADQLRQAKTPDDIIATFDSNSISQEWAPQWQSTSKGSDYIVAVTACTTG 180 

S +L+ E F L +A++ D+I+A D +E + +G + ++AVT C TG 

Sbjct: 122 STFLMDETFRSTLMKAQSEDE I LAAID KKEAETAGEAEEKQEGYE - LLAVTGCPTG 176 

Query: 181 IAHTYMAEEALKKKAAEMGVGI KVETNGASGVGNKLTSSDI ARAKGVI IAADKAVEMDRF 240 

IAHTYMA + LK KA E+GV IKVETNG+ GV N+LT +1+ AK +I+AAD VEMDRF 
Sbjct: 177 IAHTYMAADNLKSKAQELGVS I KVETNGSGGVKNRLTDEE I SAAKAI I VAADTKVEMDRF 236 

Query: 241 DGKPLVSRPVADGIKKSEDLINIILDNKAQTYHAKNQNDKQSGESDGKSGLGS AFYK 297 

GKP++ PV DGI++ ++LI+ L KA Y + Q+ DG +G G FYK 
Sbjct: 237 HGKPVIQVPVTDGIRRPKELIDQALAGKAPVY EGGAQASGEDGSAGGGRPKLGFYK 292 

Query: 298 HLMGGVSQMLPFVIGGGIMIAIAFLFDNILGVPKDQLSNLGSYHEIAALFKNIGGA-AFA 356 

HLM GVS MLPFV+GGGI+IAI+F+F P D SYH A + IGG AF 

Sbjct: 293 HLMNGVSNMLPFWGGGILIAISFMFGIKAFDPSDP SYHPFAEMLMTIGGGNAFG 347 

Query: 357 FMLP VLAGYIAYS IAEKPGLVAGFVAGS IAS SGLAFGKVPFAEGGKATLALAGVPSGFLG 416 

M+PVLA +IA SIA++PG AG + G IAS+G A GFLG 
Sbjct: 348 LMI PVLAAFIAMS IADRPGFAAGMIGGLIASTGEA GFLG 386 

Query: 417 ALVGGFLAGGVILLLRKLLSGLPKSLEGIKSILLYPLLGVLITGFLMLLVNIPMAAINTA 476 

L+ GFLAG V L ++K+L+ LP++L+GIK+IL YP+ + ITG +ML++ P+AA NT 
Sbjct: 387 GLIAGFLAGWALGVKKOTjANLPQTLDGIKTILFYPVFNIFITGMIMLVIVGPLAAFNTG 446 

Query: 477 LNTFLQGLSGSSAVLMGLLVGGMMAVDMGGPVNKAAWFGTGTIAATOANGGSVVMAAVM 536 

L +L + ++ V++G+++GGMMAVDMGGP+NKAA+ FG + A G AAVM 
Sbjct: 447 LQDWLGSMGTANMVILGVILGGMMAVDMGGPINKAAFTFGIAMIDA GNFGPHAAVM 502 

Query: 537 AGG^^VPPLAVFVATLLFKDKFNNEERQSGIJTNIVMGLSFITEGAIPFGAADPARAIPSFI 596 

AGGMVPPL + +AT LFK KF +ER++G TN ++G SFITEGAIPF AADP R IPS I 
Sbjct: 503 AGGMVPPLGIAIATTLFKKKFTKQEREAGKT^ILGASFITEGAIPFAAADPGRVIPSII 562 

Query: 597 VGSALTGALVGLAGIKLMAPHGGIFVI ALTSNPLLYILFILIGAWSGVLFGLFRK 652 

VGSA G L L + L APHGG FVI + +NPLLY++ 1+ G++V+ +L G ++K 
Sbjct: 563 VGSAFAGGLTALFNVTLSAPHGGAFVIFIGNIVNNPLLYLVAI IAGSI VTALLLGFWKK 621 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6053> which encodes the amino 
sequence <SEQ ID 6054>. Analysis of this protein sequence reveals the following: 

Possible site: 18 
>» Seems to have no N-terminal signal sequence 
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Final Results 

bacterial membrane Certainty=0. 5310 (Affirmative) <: suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:BAB04547 GB:AP001510 PTS system, fructose-specific enzyme II, BC 
component [Bacillus halodurans] 
Identities = 322/659 (48%) , Positives = 431/659 (64%) , Gaps = 48/659 (7%) 

Query: 1 MKIQDLLRKDIMILDLQAISKEVAIDEMITKLVEKDIVHDFDVFKKSIMTREEQTSTGLG 60 

+KI +LL+KD M+L+L+A SKE IDE++ L + ++D FK++I+ RE Q++TG+G 
Sbjct: 2 LKISELLKKDTMVLNLRAASKEAVIDELVRTLDKAGRLNDAQAFKRAILERESQSTTGVG 61 
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Query: 61 DGIAMPHSKNIVVDKPAVLFAKSNKGVDYKALDGQPTDLFFMIAAPQGA1TOTHLAALAEL 120 

+GIA+PH+K V +PA+ F +S+ G+DY++LDGQP+ LFFMIAA +GAN+ Hli L+ L 
Sbjct: 62 , EGIAIPHAKTAAVKQPAIAFGRSDAGIDYESLDGQPSHLFFMIAASEGANNEHLETLSRL 121 

5 Query: 121 SQYLLKDGFADKLRAAATPEAVIAVFD- -EASTAKEEWAPTSGQDFIVAVTACPTGIAH 178 

S +L+ + F L A + + ++A D EA TA E + ++AVT CPTGIAH 

Sbjct: 122 STFLMDETFRSTLMKAQSEDEILAAIDKKEAETAGEAEEKQEGYE--LLAVTGCPTGIAH 179 

Query: 179 TYMAEEALKKQAAEMGVAIKVETNGASGVANRLTAEDIQRAKGVIVAADKAVEMDRFDGK 238 
10 TYMA + LK +A E+GV+IKVETNG+ GV NRLT E+I AK +IVAAD' VEMDRF GK 

Sbjct: 180 TYMAADNLKSKAQELGVS I KVETNGSGGVKNRLTDEEI SAAKAI I VAADTKVEMDRFHGK 239 

Query: 239 QFIARPVADGIKKSQELISLILNNEGNTYHAKNGKSETAVSTEKTSLGG AFYKHL 293 

I PV DGI++ +ELI L+Y + SESGG FYKHL 

15 Sbjct: 240 PVIQVPVTDGIRRPKELIDQALAGKAPVY EGGAQASGEDGSAGGGRPKLGFYKHL 294 

Query: 294 MGGVSQMLPFVIGGGIMIALAFLLDNMLGVPNDQLGSLGSYHEIAAIFMNIGGA-AFSFM 352 

M GVS MLPFV+GGGI+IA++F+ P+D SYH A + M IGG AF M 

Sbjct: 295 MNGVSNMLPFWGGGI L I AI S FMFGI KAFDPSDP SYHPFAEMLMTIGGGNAFGLM 349 

20 

Query: 353 LPVLAGYIAYSIAEKPGLVAGFVAGAIASNGLAFGKVPFAAGGEVSLGLTGVPSGFLGAL 412 

+PVLA +IA SIA++PG AG + G IAS G A GFLG L 

Sbjct: 350 IPVLAAFIAMSIADRPGFAAGMIGGLIASTGEA GFLGGL 388 

25 Query: 413 VGGFIiAGGVILALRKLLAGLPRSLEGVKSILLYPLLGVLVTGFLMLFVNIPMAAINTALN 472 

+ GFLAG V L ++K+LA LP++L+G+K+IL YP+ + +TG +ML + P+AA NT L 
Sbjct: 389 IAGFLAGWALGVKKVLANLPQTLDGIKTILFYPVFNIFITGMIMLVIVGPLAAFNTGLQ 448 

Query: 473 DFLQGLSGSSAVLMGLLVGGMMAVDMGGPVNKAAWFGTGTLAATVANGGSVVMAAVMAG 532 
30 D+L + ++ V++G+++GGMMAVDMGGP+NKAA+ FG + A G AAVMAG 

Sbjct: 449 DWIX3SMGTAWMVI LGVI LGGMMA VDMGGPINKAAFTFGIAM I DA GNFGPHAAVMAG 504 

Query: 533 GMVPPLAVFVATLLFKDKFTKEERESGLTNIVMGLSFITEGAIPFGAADPARAIPSFIAG 592 
GMVPPL + +AT LFK KFTK+ERE+G TN ++G SFITEGAIPF AADP R IPS I G 
35 Sbjct: 505 GMVPPLGIALATTLFKKKFTKQEREAGKTNYILGASFITEGAIPFAAADPGRVIPSIIVG 564 

Query: 593 SALTGALVGLAGIKLMAPHGGIFVI ALTSNPILYLVFWIGALVSGILFGALRKKA 648 

SA G L L + L APHGG FVI + +NP+LYLV ++ G++V+ +L G +K A 
Sbjct: 565 SAFAGGLTALFNVTLSAPHGGAFVIFIGNIVMKPLLYLVAIIAGSIVTALLLGFWKKDA 623 

40 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 526/652 (80%), Positives = 581/652 (88%), Gaps = 6/652 (0%) 

Query: 1 MKIQDLLKKEVMIMDLKATSKEAAIDEMITKLVDTGVVTNFAIFKDGIMKREAQTSTGLG 60 
45 MKIQDLL+K++MI +DL+A SKE AIDEMITKLV+ +V +F +FK IM RE QTSTGLG 

Sbjct: 1 MKIQDLLRKDIMILDLQAISKEVAIDEMITKLVEKDIVHDFDVFKKSIMTREEQTSTGLG 60 

Query: 61 DGIAMPHSKNAAVKEATVLFAKSASGVDYEALDGQPTDLFFMIAAPDGANDTHLAALAEL 120 
DGIAMPHSKN V + VLFAKS GVDY+ALDGQPTDLFFMIAAP GANDTHLAALAEL 
50 Sbjct: 61 DGIAMPHSKNI VVDKPAVLFAKSNKGvDYKALDGQPTDLFFMIAAPQGANDTHLAALAEL 120 

Query: 121 SKYLLKEGFADQLRQAKTPDDIIATFDSNSISQETVAPQTVQSTSKGSDYIVAVTACTTG 180 

S+YLLK+GFAD+LR A TP+ +IA FD S ++E V T G D+IVAVTAC TG 

Sbjct: 121 SQYLLKDGFADKLRAAATPEAVIAVFDEASTAKEEWAPT SGQDFIVAVTACPTG 175 

55 

Query: 181 IAHTYMAEEALKKKAAEMGVGIKVETNGASGVGNKLTSSDIARAKGVIIAADKAVEMDRF 240 

IAHTYMAEEALKK+AAEMGV IKVETNGASGV N+LT+ DI RAKGVI +AADKA VEMDRF 
Sbjct: 176 IAHTYMAEEALKKQAAEMGVAIKVETNGASGVANRLTAEDIQRAKGVIVAADKAVEMDRF 235 

60 Query: 241 DGKPLVSRPVADGIKKSEDLINIILDNKAQTYHAKNQNDKQSGESDGKSGLGSAFYKHLM 300 

DGK ++RPVADGIKKS++LI++IL+N+ TYHAKN ++ S K+ LG AFYKHLM 
Sbjct: 236 DGKQFIARPVADGIKKSQELISLinNNEGNTYHAKN-GKSETAVSTEKTSLGGAFYKHLM 294 

Query: 301 GGVSQMLPFVIGGGIMIAIAFLFDNILGVPKDQLSNLGSYHEIAALFKNIGGAAFAFMLP 360 
65 GGVSQMLPFVIGGGIMIA+AFL DN+LGVP DQL +LGSYHEIAA+F NIGGAAF+FMLP 

Sbjct: 295 GGVSQMLPFVIGGGIMIALAFLLDNMLGVPNDQLGSLGSYHEIAAIFMNIGGAAFSFMLP 354 
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Query: 361 VLAGYIAYSIAEKPGLVAGFVAGSIASSGIAFGKVPFAEGGKATLALAGVPSGFLGALVG 420 

VIAGYIAYSIAEKPGLVAGFVAG+IAS+GLAFGKVPFA GG+ +L L GVPSGFLGALVG 
Sbjct: 355 VLAGYIAYSIAEKPGLVAGFVAGAIASNGLAFGKVPFAaGGEVSLGLTGVPSGFLGALVG 414 

5 Query: 421 GFI^GGVILLLRKLLSGLPKSLEGIKSILLYPLLGVLITGFLMLLVNIPMAAIISrrALNTF 480 . 

GFIAGGVIL LRKLL+GLP+SLEG+KSILLYPLLGVL+TGFLML VNIPMAAINTALN F 
Sbjct: 415 GFIAGGVIIALRKLIAGLPRSLEGVKSILLYPLLGVLVTGFLMLFVNIPMAAIOTALNDF 474 

Query: 481 LQGLSGSSAVLMGLLVGGMI^VDMGGPWKAAYWGTGTI^WANGGSVVMARVMAGGM 540 
1 0 LQGLSGSSAVLMGLLVGGM^VDMGGPVNKAAYVFGTGTLAAWANGGSVVMAAVMAGGM 

Sbjct: 475 LQ^LSGSSAVLMGLLVGG^mAVDMGGPWKAAYVFGTGTLAATVANGGSVVMAAVMAGGM 534 

Query: 541 VPPLAVFVATLLFKDKFNNEERQSGLTMIVMGLSFITEGAIPFGAADPARAIPSFIVGSA 600 
VPPLAVFVATLLFKDKF EER+SGLTNIVMGLSFITEGAIPFGAADPARAIPSFI GSA 
15 Sbjct: 535 VPPLAVFVATLLFKDKFTKEERESGLTNIVMGLSFITEGAIPFGAADPARAIPSFIAGSA 594 

Query: 601 LTGALVGLAGIKLMAPHGGIFVIALTSNPLLYIIiFILIGAWSGVLFGLFRK 652 

LTGALVGIAGIKLMAPHGGIFVIALTSNP+LY++F++IGA+VSG+LFG RK 
Sbjct: 595 LTGALVGLAGIKLMAPHGGIFVIALTSNPILYLVFWIGALVSGILFGALRK 646 

20 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1954 

A DNA sequence (GBSx2063) was identified in S.agalactiae <SEQ ID 6055> which encodes the amino 
25 acid sequence <SEQ ID 6056>. Analysis of this protein sequence reveals the following: 
Possible site: 29 

»> Seems to have no N-terminal signal sequence 

Final Results 

30 bacterial cytoplasm Certainty=0. 1532 (Affirmative) < succ> 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

35 >GP:AAC24914 GB:AF012285 fructose-1 -phosphate kinase [Bacillus subtilis] 

Identities = 146/303 (48%) , Positives = 197/303 (64%) 

Query: 1 MIYTVTLNPS1DFIVRLDTLLLGSVNRMTSDDKYVGGKGINVSRILKRLKIDNTATGFIG 60 
MIYTVTLNPS+D+IV ++ +G +NR + D KY GGKGINVSR+LKR + + A GF+G 
40 Sbjct: 1 MIYTVTLNPSvTJYIVHWDFWGGLiNRSSYDTKYPGGKGINVSRLLKRHHVASKALGFVG 60 

Query: 61 GFTGHFVEDGLVLEGIKTDFVSVNEDTRINV1CVKAKIETEINGGGPRITNEQLHRLEKLL 120 

GFTG +++ L E ++T F V DTRINVK+K ETEING GP I++E + 
Sbjct: 61 GFTGEYIKTFLREENLETAFSEVKGDTRINVKLKTGDETEINGQGPTISDEDFKAFLEQF 120 

45 

Query: 121 SRLTPEDTWFAGSAPASLGNKVYNTLIPIAKKTGAEWCDFEGQTLLDALAYQPLLVKP 180 

L D W AGS P+SL + Y + K+ A W D G+ LL A +P L+KP 
Sbjct: 121 QSLQEGDI WLAGS IPS SLPHDTYEKIAEACKQQNARVVLD I SGEALLKATEMKPFLMKP 180 

50 Query: 181 NNHEIADIFGVELEGLPDIEKYAHKILDKGAKNVIVSMAGDGALLVTPEASYFAKPIKGE 240 

N+HEL ++FG + + + Y K++++GA++VIVSMAGDGALL T EA YFA KG+ 
Sbjct: 181 NHHELGEMFGTAITSVEFAVPYGKKLVECjGAEHVIVSI^GDGALLFTNFAVYFANVPKGK 240 

Query: 241 VTCNSVGAGDSMVAGFTGEFVKSKNPVEALKWGVACGTATTFSDDLATAEFIQDIYNKVEV 300 
55 + NSVGAGDS+VAGF K EA + GV G+AT FS++L T EF+Q + +V+V 

Sbjct: 241 LTOSVGAGDSWAGFIAGISKQLPLEEAFRLGVTSGSATAFSEELGTEEFVQQLLPEVKV 300 

Query: 301 EKL 303 
+L 

60 Sbjct: 301 TRL 303 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 605 7> which encodes the amino acid 
sequence <SEQ ID 6058>. Analysis of this protein sequence reveals the following: 

Possible site: 57 ' 
>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1738 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 222/302 (73%) , Positives = 261/302 (85%) 

Query: 1 MIYTVTLNPSIDFIVRLDTLLLGSVNRMTSDDKYVGGKGINVSRILKRLKIDNTATGFIG 60 

MIYTVTLNPSIDFIVR+D + LGSVNRM SDDK+ GGKGINVSRIL+RL I +TATGF+G 
Sbjct: 1 MIYTVTLNPSIDFIVRIDQINLGSVNRMASDDKFAGGKGINVSRILQRLDIASTATGFLG 60 

Query: 61 GFTGHFVEDGLVLEGIKTDFVSVMEDTRINVKVKAKIETEINGGGPRITNEQLHRLEKLL 120 

GFTG F+E+ L EG+KTDFV ++DTRINVK+K++ ETE+NG GP 1+ EQL L+ L 
Sbjct: 61 GFTGRFIEESLSAEGVKTDFVKGDQDTRINVKIKSQEETELNGQGPIISQEQLEDLKTKL 120 

Query: 121 SRLTPEDTWFAGSAPASLGNKVYNTLIPIAKKTGAEWCDFEGQTLLDALAYQPLLVKP 180 

S+LT EDTWFAGSAPA+LGN VY L+P+ +++GA+WCDFEGQTL+DALAY PLLVKP 
Sbjct: 121 SQLTAEDTWFAGSAPANLGNAVYKELLPLVRQSGAQWCDFEGQTLIDALAYNPLLVKP 180 

Query: 181 NNHELADIFGVELEGLPDIEKYAHKILDKGAKNVIVSMAGDGALLVTPEASYFAKPIKGE 240 

NNHEL IFG L L D+E YA ++L+ GA+NVI+SMAGDGALLVT EA+YFAKP I KGE 
Sbjct: 181 NNHELEAIFGTILTSLDDVETYARRLLEMGAQNVIISMAGDGALLVTKEATYFAKPIKGE 240 

Query: 241 VKNSVGAGDSMVAGFTGEFVKSKNPVEALKWGVACGTATTFSDDLATAEFIQDIYNKVEV 300 

VKNSVGAGDSMVAGFTGEF+ KS +NP+EALKWGVACGTAT FSDDLAT FI++ Y+KVEV 
Sbjct: 241 VKNSVGAGDSMVAGFTGEFMKSQNPIEALKWGVACGTATAFSDDLATIAFIKETYHKVEV 300 

Query: 301 EK 302 
EK 

Sbjct: 301 EK 302 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1955 

A DNA sequence (GBSx2064) was identified in S.agalactiae <SEQ ID 6059> which encodes the amino 
acid sequence <SEQ ID 6060>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2769 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not , Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9961> which encodes amino acid sequence <SEQ ID 9962> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC24913 GB:AF012285 FruR [Bacillus subtilis] 
Identities = 97/247 (39%) , Positives = 148/247 (59%) , Gaps = 4/247 (1%) 

Query: 23 MLKSKRKEIILSRLEQNKSVTLDELTSILETSESTVRRDLDELESAGFLKRVHGGAELPY 82 
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10 



15 



Sbjct: 


1 


Query: 


83 


Sb j ct : 


61 


Query: 


142 


Sb j ct : 


121 


Query: 


201 


Sbjct: 


181 


Query: 


261 


Sb j ct : 


239 



-2201- 

ML +R ++I+ ++E++ V + EL ++ SEST+RRDL- IiE GFLKRVHGGA 
MLTPERHQLI IDQIEKHDWKIQELINLTKftSESTIRRDLSTLEERGFLKRVHGGAAKLS 6 0 



+ E EK+ KN+ KL IA + A L+ + D I++DAGTTT +IDF+ + + W 
DIRLEPDMLEKSSKNLHDKLKIAEKAASLLEEGDCIYLDAGTTTLHMIDFMDKTKDIVW 120 



TN + H L+ + I ++GG VKH T A IG ++ + Q DK+FLG NG+ E 



TTPD +EA +K+ I ++ ++L D SK G+++F+ I D ++T TD+E +T 



+ K + 
iTQEKTV 245 

20 A related DNA sequence was identified in S. pyogenes <SEQ ID 606 1> which encodes the amino acid 
sequence <SEQ ID 6062>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

»> Seems to have no N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm Certainty=0 . 2604 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

30 An alignment of the GAS and GBS proteins is shown below. 

Identities = 135/237 (56%) , Positives = 184/237 (76%) 

LSRLEQNKSVTLDELTSILETSESTVRRDLDELESAGFLKRVHGGAELPYSLGQELSNQE 92 
++++ + V+L++L +L +SEST+RRDL ELE G L RVHGGAEL +SL +ELSNQE 
MAKITEENYVSLEDLMQLLNSSESTIRRDLGELEQEGRLHRVHGGAELFHSLQEELSNQE 6 0 

KAiraWQKKLDIARQTAKLIAKQDVIFIDAGTTTELLIDFLPHEQLTVVrNSIHHAAKLV 152 
K++KN K IA++ ++LI DVIFIDAGTTTE h+ FL + LTWTNSIHHAA+LV 
KSVKNSHIKKAIAQRASQLIYDNDVIFIDAGTTTEFLLPFLQAKNLTVVTNSIHHAARLV 120 

DRGIKTIIIGGAVKHSTDASIGQVAINQIRQITVDKAFLGMNGIDEVYLTTPDLEEAAIK 212 
+ I+TII+GG VK +TDASIG VA+ QIRQ+ DKAFLGMNG+D+ YLTTPD+EEA IK 



+A+++N++ +IL+D +KIGQV+F KV IND+ ++T + ++ IKEK KVI++ 
KAVLSNAKIAYILVDGTKIGQVSFVKVAPINDVTIITLGGSASILKQIKEKAKVIEL 237 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diapnnstics 





Query: 


33 


35 


Sbjct: 


1 




Query: 


93 


40 


Sbjct: 


61 




Query: 


153 




Sb j ct : 


121 


45 


Query: 


213 




Sbjct: 


181 



50 vaccines or diagnostics 



Example 1956 

A DNA sequence (GBSx2065) was identified in S.agalactiae <SEQ ID 6063> which encodes the amino 
acid sequence <SEQ ID 6064>. This protein is predicted to be beta-lactam resistance factor. Analysis of this 
protein sequence reveals the following: 

55 Possible site: 32 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 5777 (Affirmative) < suco 
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bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

5 >GP:CAB89121 GB:AJ277485 beta-lactam resistance factor 

[Streptococcus pneumoniae] 
Identities = 215/410 (52%) , Positives = 283/410 (68%) 

Query: 1 MTLRELTIEEFKEHSGNYDSQSFLQTPEMAKLLEKRGYDVRYLGYQVENKLEIISLSYIM 60 
10 M L LT EEF+ +S S+SF+Q+ +M LLEKRG + YL + E ++++ +L Y + 

Sbjct: 1 MALTTLTKEEFQTYSDQVSSRSFMQSVQMGDLLEKRGARIVYLALKQEGEIQVAALVYSL 60 

Query: 61 PVTGGFQMKIDSGPVHSNSKYLKQFYKALQGYAKSNGVLELIVEPYDDYQLFTSSGVPSN 120 
P+ GG M+++SGP+++ L FY L+ YAK NGVLEL+V+PY+ YQ F S G P + 
15 Sbjct: 61 PMLGGLHMELNSGPIYTQQDALPVFYAELKEYAKQNGVLELLVKPYETYQTFDSQGNPID 120 

Query: 121 QGNDNLIEDFTSSGYHHDGLTTGFTGKYLSVfflYVKNLEGVTSETLLSSFSKTGRALVKKA 180 

++I+D T GY DGLTTG+ G W Y K+L +T ++LL SFSK G+ LVKKA 
Sbjct: 121 AEKKSIIQDLTDLGYQFDGLTTGYPGGEPDWLYYKDLTELTEKSLLKSFSKKGKPLVKKA 180 

20 

Query: 181 MSFGIKVRVLKRDELHLFKEITTSTSNRRDYMDKSLDYYQDFYDSFEGKAEFVIATMIFR 240 

+FGI+++ LKR+EL +FK IT TS RR+Y DKSL+YY+ FYD+F +AEF+IA+LNF 
Sbjct: 181 ETFGIRLKKLKREELSIFKNITKETSERREYSDKSLEYYEHFYDTFGEQAEFLIASLNFS 240 

25 Query: 241 EYDHNLQIKAEALENKLKLLDERFRENADSPKYHRQRSEIINQLASFETRRQEVQSFIQK 300 

+Y LQ + LE L L +N S K Q E +Q +FE R+ E + 1+K 

Sbjct: 241 DYMSKLQGEQSKLEENLDKLRLDLSKNPHSEKKQNQLREYSSQFETFEVRKAEARDLIEK 300 

Query: 301 YDNQDVVIAGSLFVYSLKETWFFSGSYTEFNKFYAPAV^EYVMQEALKRGSTFYNLLG 360 
30 Y +D+VLAGSLFVY +ET Y FSGSYTEFNKFYAPA+LQ+YVM E++KRG YN LG 

Sbjct: 301 YGEEDIVIiAGSLFvYMPQETTYLFSGSYTEFNKFYAPALLQK^n/MLESIKRGIPKYNFIiG 360 

Query: 361 IQGTFDGSDSILRFKQNFNGCIIRKMGTFNYYPSPFKYRGIQLLKKVLKR 410 
IQG FDGSD +LRFKQNFNG I+RK GTF Y+PSP KYK IQLLKK++ R 
35 Sbjct: 361 IQGIFDGSDGVLRFKQNFNGYIVRKAGTFRYHPSPIjKYKAIQLLKKIVGR 410 

There is also homology to SEQ ID 5460. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

40 Example 1957 

A DNA sequence (GBSx2066) was identified in S.agalactiae <SEQ ID 6065> which encodes the amino 
acid sequence <SEQ ID 6066>. This protein is predicted to be cell wall protein, 40 kDa (sr 5' region). 
Analysis of this protein sequence reveals the following: 

Possible site: 42 
45 >» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.45 Transmembrane 25 - 41 ( 23 - 42) 

Final Results 

bacterial membrane Certainty=0 .2381 (Affirmative) < suco 

50 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9963> which encodes amino acid sequence <SEQ ID 9964> 
was also identified. 

55 The protein has homology with the following sequences in the GENPEPT database. 

!GB:AF278686 choline binding protein D; CbpD [Strept... 
!GB:AF278686 choline binding protein D; CbpD [Strept... 
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>GP:AAF87768 GB:AF278686 choline binding protein D; CbpD 
[Streptococcus pneumoniae] 
Identities = 63/230 (27%) , Positives = 108/230 (46%) , Gaps = 34/230 (14%) 

5 

Query: 324 WTEQGGQDDI KWYTAVTTGDG -NYKWAVSFADHKNEKGLYNIHLYYQEASGTLVG 377 

W+ G + W + V GD NY S+ + +++++ G VG 

Sbjct: 123 WSTAGTYGHVAWVSNVM - GDQIE IEE YNYGYTESYNKRVI KANTMTGFI HFKDLDGGSVG 181 

10 Query: 378 VTGTKVTVAGTNSSQEPIENGLAKTGVYNIIGSTEVKNEAKISSQTQFTLEKGDKINYDQ 437 
+ + + GT+ + + +K E S G+K++YDQ 

Sbjct: 182 NSQSSTSTGGTHYFKT KSAIKTEPLASGTVIDYYYPGEKVHYDQ 225 

Query: 438 VLTADGYQWISYKSYSGVRRYIPVKKLTTSSEKAKDEATKPTSYPNLPKTG-TYTFTKTV 496 
15 +L DGY+W+SY +Y+G RY+ ++ + + P L TG T+ F 

Sbjct: 225 I LEKDGYKWLS YTAYNGS YRYVQLEAVNKN PLGNS VLS STGGTHYFKTKS 275 

Query: 497 DVKSQPKVSSPVEFNFQKGEKIHYDQVLWDGHQWISYKSYSGIRRYIEI 546 
+K++P VS+ V + GEK+HYDQ+L DG++W+SY +Y+G RRYI++ 
20 Sbjct: 276 AIKTEPLVSATVIDYYYPGEKVHYDQILEKDGYKWLSYTAYNGSRRYIQL 325 

Identities = 49/161 (30%) , Positives = 85/161 (52%) , Gaps = 14/161 (8%) 

Query: 116 GNYVYSKETEVKNTPSKSAPVAFYAKKGDKVFYDQVFNKDNVKWISYKSFCGVRRYAAIE 175 
G + + +++K P S V Y G+KV YDQ+ KD KW+SY ++ G RY +E 
25 Sbjct: 191 GTHYFKTKSAIKTEPIASGWIDYYYPGEKVHYDQILEKDGYKWLSYTAYNGSYRYVQLE 250 

Query: 176 SLDPSGGSETKAPTPVTNSGSNNQEKIATQGNYTFSHKVEVKNEAKVASPTQFTLDKGDR 235 

+++ + P+ NS + +T G + F K +K E V++ G++ 

Sbjct: 251 AVNKN PLGNSVLS STGGTHYFKTKSAIKTEPLVSATVIDYYYPGEK 296 

30 

Query: 236 IFYDQILTIEGNQWLSYKSFNGVRRFVLLGKASSVEKTEDK 276 

+ YDQIL +G +WLSY ++NG RR++ L +S + +++ 
Sbjct: 297 VHYDQILEKDGYKWLSYTAYNGSRRYIQLEGVTSSQNYQNQ 337 
Identities = 52/192 (27%) , Positives = 90/192 (46%) , Gaps = 13/192 (6%) 

Query: 295 ISNETTTGFDILITOIKDDNGIAAV1CVPVWTEQGGQDDIKWYTAVTTGDGNYKVAVSFAD 354 

I T TGF + KD +G + T GG K +A+ T + + 

Sbjct: 161 IKANTMTGF IHFKDLDGGSVGNSQSSTSTGGTHYFKTKSAIKTEPLASGTVIDYY- 215 

40 Query: 355 HKNEKGLYNIHLY YQEASGTLVGVTGTKVTVAGTNSSQEPIENGLAKT- -GVYNIIG 409 

+ EKY+L Y+ST + V+ N+P+N+ + G + 
Sbjct: 216 YPGEKVHYDQILEKDGYKWLSYTAYNGSYRYVQLEAVNKN- - PLGNSVLSSTGGTHYFKT 273 

Query: 410 STEVKNEAKI SSQTQFTLEKGDKINYDQVLTADGYQWI SYKSYSGVRRYI PVKKLTTSSE 469 
45 + +K E +S+ G+K++YDQ+L, DGY+W+SY +Y+G RRYI ++ + TSS+ 

Sbjct: 274 KSAIKTEPLVSATVIDYYYPGEKVHYDQILEKDGYKWLSYTAYNGSRRYIQLEGV-TSSQ 332 

Query: 470 KAKDEATKPTSY 481 
++++ +SY 
50 Sbjct: 333 NYQNQSGNISSY 344 

Identities = 33/113 (29%) , Positives = 56/113 (49%) , Gaps = 2/113 (1%) 

Query: 91 NTATKDITTPLVETKPMVEKTLPEQGNYVySK-ETEVKNTPSKSAPVAFYAKKGDKVFYD 149 
N+ + + V P+ L G YK+++KPSAVY G+KV YD 
55 Sbjct: 241 NGSYRWQLEAWKKPLGNSVLSSTGGTHYFKTKSAIKTEPLVSATVIDYYYPGEKVHYD 300 

Query: 150 QVFNKDNVKWI S YKS FCGVRRYAAI ESLDPSGGSETKAPTPVTNSGSNNQEKI 202 

Q+ KD KW+SY ++ G RRY +E + S + ++ +++ GS++ + 
Sbjct: 301 QILEKDGYKWLSYTAYNGSRRYIQLEGVTSSQNYQNQSGN-ISSYGSHSSSTV 352 



35 



60 



A related GBS gene <SEQ ID 8937> and protein <SEQ ID 8938> were also identified. Analysis of this 
protein sequence reveals the following: 



Lipop: Possible site: -1 Crend: 10 
McG: Discrim Score: -6.74 
65 GvH: Signal Score (-7.5): 1.26 

Possible site: 42 
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>>> Seems to have no N-terminal signal sequence 

ALOM program count: 1 value: -3.45 threshold: 0.0 

INTEGRAL Likelihood = -3.45 Transmembrane 22 - 39 ( 23 - 

PERIPHERAL Likelihood = 6.26 371 
modified ALOM score: 1.19 



*** Reasoning Step: 3 



Final Results 

bacterial membrane --- Certainty=0. 23 81 (Affirmative) < suco 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) 



The protein has homology with the following sequences in the databases: 

41.2/57.9% over 283aa 

Streptococ 

cus mutans 

EGAD | 33594 | cell wall protein, 40 kDa (sr 5' region) Insert characterized 
PIR|A60328|A60328 40K cell wall protein precursor (sr 5' region) - (strain OMZ 
175, serotype f) Insert characterized 



ORF02145(301 - 1803 of 2238) 

EGAD| 33594 | 34911 (30 - 313 of 335) cell wall protein, 40 kDa (sr 5' region) {Stre 
ptococcus mutans}PIR|A60328|A60328 40K cell wall protein precursor (sr 5' region 
) - Streptococcus mutans (strain 0MZ175, serotype f) 
%Match =8.0 

%Identity =41.1 %Similarity =57.9 

Matches = 81 Mismatches = 79 Conservative Sub.s = 33 

156 186 216 246 276 306 336 366 

* YA* * * * FCYTKNNKSWVFFSRSIYSIKYYICITNISKIC*HVTKRIL* * *CK* IRK*VFMMKKGQVNDTKQSYSLRKYK 

mQKIWISSFYMLGAHSFSKAVYHNDRSVKLMKRIDINHQAQRFSIRKYA 
10 20 30 40 50 

396 426 456 486 516 546 576 606 

FGLASVILGSFIMVTSPVFADQTTSVQVNNQTGTSVDANNSSNETSASSVITSNNDSVQASDKVVNSQNTATKDITTPLV 

|| |||::| : : : | |: I = = h :|| I I = I I HI 

FGAASVLIGCVFFLGTQNVSAQEQGTQL PASENAWNVAENSVAISQAVADKAATQTTLTETPQV 

70 80 90 100 110 



654 684 714 

ETKPMVEK- TLPEQGNYVYSKETEVKNTPSKSAPVAF 

| : | ::| lllll: = I III I M 

EVEEKESKVNAPALNVDDKGAKSKEDVN AEQNEKAVRENLMCRQAKAVS I PSQGNYVFQETTPVKNAASMSSP- - - 

130 140 200 210 '220 230 240 

744 1533 1563 1593 1623 1653 1683 

YAKKGDKVFYDQVFNKD GVYNIIGSTEVKNEAKISSQTQFTLEKGDKINYDQVLTADGYQWISYKSYSGVRRYIPV 

III ::||||: II II llhlllll 1111 = 111 1 = 
TQFNFDKGDKVFYDNVLEADGHQWISYVSYSGIRRYAPI 

250 260 270 

1713 1743 1773 1803 1833 1863 1893 1923 

KKLTTSSEKAKDFATKPTSYPNLPKTGTYTFTKTVDW 

: :: I III III III =1 = I 
AVTIEELKQKEIVQQNLPAQGTYHFTKQQSLKMKLNCLVRPNSRFTTEITFFMIRF 

290 300 310 320 330 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6067> which encodes the amino acid 
sequence <SEQ ID 6068>. Analysis of this protein sequence reveals the following: 

Possible site: 23 
»> Seems to have a cleavable N-term signal seq. 
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Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:ARF87768 GB:AF278686 choline binding protein D; CbpD 
[Streptococcus pneumoniae] 
Identities = 93/217 (42%) , Positives = 136/217 (61%) , Gaps = 18/217 (8%) 

Query: 42 GDNYPSKWKKGNG-IDSWNMYIRQCTSFAAFRLSSANGFQLPKGYGNACTWGHIAKNQGY 100 

GD+YP+ +K G+ ID W MY RQCTSF AFRLS+ NGF++P YGNA WGH A+ +GY 
Sbjct: 51 GDDYPAYYKNGSQEIDQWRMYSRQCTSFVAFRLSNVNGFEIPAAYGNANEWGHRARREGY 110 

Query: 101 PVNKTPSIGAIAWFDKNAYQSNAAYGHVAWADIRGDTVTIEEYNYNAGQGPERYHKRQI 160 

V+ TP+IG+I W + YGHVAWV+++ GD + IEEYNY E Y+KR I 

Sbjct: 111 RVDNTPTIGSITW STAGTYGHVAWVSNVMGDQIEIEEYNYGY TESYNKRVI 161 

Query: 161 PKSQVSGYIHFKDLSSQTSHSYPRQLKHISQASFDPSGTYHFTTRLPVKGQTSIDSPDLA 220 

+ ++G+IHFKDL + + SQ+S GT++F T+ +K + + 

Sbjct: 162 KANTMTGFIHFKDLDGGSVGN SQSSTSTGGTHYFKTKSAIKTEPLASGTVID 213 

Query: 221 YYEAGQSVYYDKWTAGGYTWLSYLSFSGNRRYIPIK 257 

YY G+ V+YD+++ GY WLSY +++G+ RY+ ++ 
Sbjct: 214 YYYPGEKVHYDQILEKDGYKWLSYTAYNGSYRYVQLE 250 

An alignment of the GAS and GBS proteins is shown below. 
Identities = 34/94 (36%) , Positives = 52/94 (55%) 

Query: 453 SGVRRYIPVKKLTTSSEKAKDEATKPTSYPNLPKTGTYTFTKTVDVKSQPKVSSPVEFNF 512 

S V YI K L++ + + K S + +GTY FT + VK Q + SP + 

Sbjct: 163 SQVSGYIHFKDLSSOTSHSYPRQLKHISOASFDPSGTYHFTTRLPVKGQTSIDSPDIAYY 222 

Query: 513 QKGEKIHYDQ VLWDGHQWI SYKSYSG I RRY I E I 546 

+ G+ ++YD+V+ G+ W+SY S+SG RRYI I 
Sbjct: 223 EAGQSVYYDKWTAGGYTWLSYLSFSGNRRYIPI 256 

Identities = 30/78 (38%) , Positives = 45/78 (57%) , Gaps = 2/78 (2%) 

Query: 402 TGVYNIIGSTEVKNEAKISSQTQFTLEKGDKINYDQVLTADGYQWISYKSYSGVRRYIPV 461 

+GY+ VK+IS EG + YD+V+TA GY W+SY S+SG RRYIP+ 

Sbjct: 197 SGTYHFTTRLPVKGQTS IDS PDIAYYE^GQS VYYDKVVTAGGYTWLS YLS FSGNRRYI PI 256 

Query: 462 KKLTTS SEKAKDEATKPT 479 

K+ + +++ TKP+ 
Sbjct: 257 KE - - PAQS WQNDNTKPS 272 
Identities = 27/94 (28%) , Positives = 47/94 (49%) 

Query: 198 NQEKIATQGNYTFSHKVEVKNEAKVASPTQFTbDKGDRIFYDQILTIEGNQWLSYKSFNG 257 

+Q G Y F+ ++ VK + + SP + G ++YD+++T G WLSY SF+G 

Sbjct: 190 SQASFDPSGTYHFTTRLPVKGQTSIDSPDLAYYEAGQSVYYDKWTAGGYTWLSYLSFSG 249 

Query: 258 VRRFVLLGKAS SVEKTEDKEKVS PQPQARITKTG 291 

RR++ +++ DKS+ +TG 

Sbjct: 250 NRRYIPIKEPAQSWQNDNTKPSIKVGDTVTFPG 283 
Identities = 23/73 (31%) , Positives = 35/73 (47%) 

Query: 103 ETKPMVEKTLPEQGNYVYSKETEVKNTPSKSAPV7AFYAKKGDKVFYDQVFNKDNVKWISY 162 

+ K + + + GY++ VK S+P Y + G V+YD+V W+SY 
Sbjct: 185 QLKHISQASFDPSGTYHFTTR1PVKGQTSIDSPDLAYYEAGQSVYYDKVVTAGGYTWLSY 244 

Query: 163 KSFCGVRRYAAIE 175 

SF G RRY 1+ 
Sbjct: 245 LSFSGNRRYIPIK 257 
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SEQ ID 8938 (GBS91) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 18 (lane 7; MW 63kDa). 

The GBS91-His fusion product was purified (Figure 195, lane 9) and used to immunise mice. The resulting 
antiserum was used for FACS (Figure 283), which confirmed that the protein is immunoaccessible on GBS 
5 bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1958 

A DNA sequence (GBSx2067) was identified in S.agalactiae <SEQ ID 6069> which encodes the amino 
10 acid sequence <SEQ ID 6070>. This protein is predicted to be thiamine biosynthesis protein. Analysis of 
this protein sequence reveals the following: 

Possible site: 40 

>>> Seems to have no N- terminal signal sequence 

15 Final Results 

bacterial cytoplasm Certainty=0 . 0984 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB49673 GB:AJ248285 PROBABLE 2 -DEHYDROPANTOATE 2 -REDUCTASE (EC 
1.1.1.169) [Pyrococcus abyssi] 
Identities = 85/301 (28%) , Positives = 150/301 (49%) , Gaps = 7/301 (2%) 



Query: 


1 


MLVYIAGSGAMGCRFGYQISKTNHDVILLDNWADHIMAIKENGLKVTGDTEDLVKLPIMK 


60 






M +YI G+GA+G FG ++ DV+L+ H+ AI E GLK+ G + VK+ 




Sbjct: 


1 


MKIYILGAGAIGSLFGGLIjANAGEDVLIiIGR-DPHVSAlNEKGLKIVGIKDLNVKVEATT 


59 


Query: 


61 


PTDATEEADLIILFTKAMQLPNMLQDIKKI IGKKTKVLCLLNGLGHEDVIRQYIPEHNIL 


120 






E+ DLI+L TK+ L+ + 1+ K + VL + NG+G+ED I ++ + 




Sb j ct : 


60 


RVPE-EKPDLI VLATKSYSTIEALKSARHIV-KGSWVLSIQNGIGNEDKIIEF- -GGKAI 


115 


Query: 


121 


MGVTVWTAGLKGPGHAHLEGVGSVNLQSIDPNNQEAGHRvTELLNEAKLQATYDEISIVIj 


180 






G+T A ++ PG G G + ++ +V ++ N A ++ EN++ 




Sb j ct : 


116 


GGITTNGAMVFAPGVIKOT'GKGVTIIGLYPQGKEKFIEKVADVFNSADIETHVSENIISW 


175 


Query: 


181 


IWRKACTOGTMNSTCALLDCTIGQLFASEDGVNIWHEIIHEFVTVGKAEGVELDEEEITK 


240 






IW KA VN +N LL+ + ++ ++M E++ E V G+E D + 




Sb j ct : 


176 


IWAKAIVNSAINPIGTLLEVKNKVIRENDFLLSMAMEWKEGCRVALQNGIEFDVPPMDL 


235 


Query: 


241 


YVMDTSVKAAHHYPSMHQDLVQNQRLTEIDFLNGAVNKKGENLGIDTPYCRLITQLIHTKE 301 






+ T + +Y SM QD+ + ++ TE+D++NG + + + + ++ P L+ LI KE 


Sb j ct : 


236 


F-FQTLEQTRENYNSMLQDIWRGKK-TEVDYINGKIVEYAKAVNLEAPMNLLLWGLIKGKE 294 



45 A related DNA sequence was identified in S.pyogenes <SEQ ID 607 1> which encodes the amino acid 
sequence <SEQ ID 6072>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

»> Seems to have no N-terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty=0. 1392 (Affirmative) < suco 

bacterial membrane Certainty=0.0000 (Not Clear) < suco 

bacterial outside Certainty^O . 0000 (Not Clear) < suco 



55 An alignment of the GAS and GBS proteins is shown below. 



WO 02/34771 



PCT/GB01/04789 





Query: 


1 


5 ' 


Sbjct: 


1 




Query: 


61 


10 


Sb j ct : 


61 




Query: 


121 




Sb j ct : 


121 


15 


Query: 


181 




Sb j ct : 


181 


20 


Query: 


241 




Sbjct: 


241 




Query: 


301 


25 


Sbjct: 


301 
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Identities = 262/307 (85%) , Positives = 288/307 (93%) 

MLVYIAGSGAMGCRFGYQISKTiraDVILLDMflADHIMAIKENGLKVTGDTEDLVKLPIMK 6 0 
MLWIAGSGAMGCRFGYQISKTN+DVILLDNW DHI AIKENGL VTGD E+ VKLPIMK 
MLWIAGSGAMGCRFGYQISKTffltfDVILLDNWEDHINAIKEN^ 60 

PTDATEEADLI I LFTKAMQLPNMLQD I KKI IGKKTKVLCLLNGLGHEDVI RQYI PEHNI L 120 
PT+AT+EADLI ILFTKAMQLP MLQDIK IIGK+TKVLCLLNGLGHEDVIRQYIPEHNIL 
PTEATQEADLIILFTKAMQLPQMLQDIKGIIGKETKVLCLLNGLGHEDVIRQYIPEfJNIL 120 

MGVTVWTAGLKGPGHAHLEGVGSVNLQSIDPM^QEAGHRVTELLNEAKljQATYDElWLPN 180 
MGVTVWTAGL+GPG AHL+GVG++NLQS+DP+NQEAGH+V +LLNEA L ATYDENV+PN 
MGVTVWTAGLEGPGRAHLC^VGAIjNLQSMDPSNQEAGHQVADLIiNE 180 

IWRKACVNGTMNSTCALLDCTIGQLFASEDGVNMVHEIIHEFVTVGKAEGVELDEEEITK 240 
IWRKACVNGTMNSTCALLDCTIG+LFASEDG+ MV EIIHEFV VG+AEGVEL+EEEIT+ 
IWRKACVNGTMNSTCALLDCTIGELFASEDGLKMVKEI IHEFVIVGQAEGVELNEEEITQ 240 

YVMDTSVKAAHHYPSMHQDLVQNQRLTEIDFLNGAVNKKGENLGIDTPYCRLITQLIHTK 300 
YVMDTSVKAAHHYPSMHQDLVQN RLTEIDF+NGAVN KGE LGI+TPYCR+IT+L+H K 
YvMDTSVKAAHHYPSMHQDLVQJSHRLTEIDFINGAVOTKGEKLGINTPYCRMITELVHAK 300 

ENVLSIK 307 
E VL+I+ 
EAVLNIQ 307 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1959 

30 A DNA sequence (GBSx2068) was identified in S.agalactiae <SEQ ID 6073> which encodes the amino 
acid sequence <SEQ ID 6074>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.03 Transmembrane 61 - 77 ( 61 - 78) 
35 INTEGRAL Likelihood = -1.33 Transmembrane 80 - 96 ( 79 - 96) 

Final Results 

bacterial membrane Certainty=0 . 2211 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
45 vaccines or diagnostics. 

Example 1960 

A DNA sequence (GBSx2069) was identified in S.agalactiae <SEQ ID 6075> which encodes the amino 
acid sequence <SEQ ID 6076>. This protein is predicted to be regulatory protein (pfoS/R). Analysis of this 
protein sequence reveals the following: 

50 Possible site: 49 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -9.82 Transmembrane 317 - 333 ( 304 - 335) 

INTEGRAL Likelihood = -7.64 Transmembrane 187 - 203 ( 183 - 217) 

INTEGRAL Likelihood = -5.26 Transmembrane 24 - 40 ( 18 - 44) 

55 INTEGRAL Likelihood = -5.04 Transmembrane 143 - 159 ( 139 - 161) 
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INTEGRAL Likelihood = -2.34 Transmembrane 116 - 132 ( 115 - 136) 
INTEGRAL Likelihood = -2.13 Transmembrane 55 - 71 ( 55 - 71) 
INTEGRAL Likelihood = -0.96 Transmembrane 268 - 284 ( 268 - 284) 



Final Results 

bacterial membrane Certainty=0. 4927 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC65034 GB:AE001189 regulatory protein (pfoS/R) [Treponema 
pallidum] 

Identities = 138/358 (38%) , Positives = 220/358 (60%) , Gaps = 18/358 (5%) 

Query: 2 TNTVTPKETAGSFINKVLGGTATAIWALIPNAILATFLKPFLSYG-LAAEFLHIVQVFQ 60 

T +++P++ F+ K+L G++ IV+ L+P AI + LA H+V Q 

Sbjct: 3 TQSLSPRQ FMMKILNGSSAGIVIGLVPPAIAGELFRALAPLSPLFAALYHWLPIQ 58 

Query: 61 FFTPIMAGFLIGQQFKFTPMQQLAVGGAAYIGSGAWAYTEVIQKGVATGSFQLRGIGDLI 120 

F P + G L+G QF + + + + I SG + G++ + GIGD+I 

Sbjct: 59 FSVPALIGTLVGLQFHCSAPEVATLAFVSVIASG NVTLQNGAWLITGIGDVI 110 

Query: 121 NMMLTAALAVLAVKWFGNKFGSLTIILLPIIIGTGVGYLGWKLLPYVSYVTTLIGQGINS 180 

N+ML +ALA++ V+ K GSLTII LP+I+ G +G LPYV +T +G+ I + 
Sbjct: 111 NVMLISAIiAIILVRALRGKLGSLTIIALPVimWAGGVGSFSLPYVKMITLFVGRVIAT 170 

Query: 181 FTTLQPIAMS I LI AMAFSMLI VS P I STVAIGLAIGLNGMSASAASMGVASTTAVLVWATM 240 

F LQP+ MSIL++M+FS++I+SP+S+VA+G+A+GL G+++ AA++GV+S L+ TM 
Sbjct: 171 FIALQPLLMSILLSMSFSLIIISPVSSVAVGIAVGLTGIASGAANIGVSSCAMTLIVGTM 230 

Query: 241 KANKSGVPIAIALGAMKMmPNFLKHPV^IPMLMTAWSSLTVPLFKLVGTPASSGFGL 300 

+ NK GVP+A+ GAMKM+MPN++++P++ IP+L+ V + LF L GTPAS+GFG 
Sbjct: 231 RVNKIGVPLAMFAGAMKMLMPNWIRYPILNIPLLLNGLVCGVLAWLFNLQGTPASAGFGF 290 

Query: 301 VGAVGPIASFE- -AGASML IVILSWL VT PFAVGFVSHKI CKD I LKLYKDD I FVFE 353 

+G VGPI ++ A M+ 1+ L + V+ F ++ ID LKLY+ ++F+ E 
Sbjct: 291 IGLVGPINAYRLMAYTPMVRAGILFLVYFVLSFLAAYLIDFILVDRLKLYRRELFIPE 348 



There is also homology to SEQ ID 1280. 

A related GBS gene <SEQ ID 8939> and protein <SEQ ID 8940> were also identified. Analysis of 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: -7.24 
GvH: Signal Score (-7.5): -2.94 

Possible site: 49 
>» Seems to have no N-terminal signal sequence 



M program 


count: 7 value: 


-9. 


.82 threshold: 


0.0 










INTEGRAL 


Likelihood 




-9. 


.82 


, Transmembrane 


317 - 


333 


( 


304 


- 335) 


INTEGRAL 


Likelihood 




-7. 


.64 


Transmembrane 


187 - 


203 


( 


183 


- 217) 


INTEGRAL 


Likelihood 




-6. 


,37 


Transmembrane 


143 - 


159 


( 


136 


- 161) 


INTEGRAL 


Likelihood 




-5. 


.26 


Transmembrane 


24 - 


40 


( 


18 


- 44) 


INTEGRAL 


Likelihood 




-2.34 


Transmembrane 


116 - 


132 


( 


115 


- 136) 


INTEGRAL 


Likelihood 




-2 


.13 


Transmembrane 


55 - 


71 


( 


55 


- 71) 


INTEGRAL 


Likelihood 




-0. 


.96 


Transmembrane 


268 - 


284 


( 


268 


- 284) 


PERIPHERAL 


Likelihood 




0 


.69 


205 













modified ALOM score: 2.46 



*** Reasoning Step: 3 



Final Results 

bacterial membrane -• 
bacterial outside -■ 
bacterial cytoplasm -• 



- Certainty=0. 4927 (Affirmative) < suco 

- Certainty=0.0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the databases: 

ORF02147(337 - 1359 of 1668) 

EGAD| 138195|TP0038 (10 - 348 of 350) regulatory protein {Treponema pallidum} OMNI |TP0038. 
regulatory protein (pfoS/R) GP| 3322295 |gb|AAC65034 . l| |AE001189 regulatory protein (pfoS/R) 
5 {Treponema pallidum} PIR| E71373 | E71373 probable regulatory protein (pfoS/R) - syphilis 

spirochete 
%Match =21.6 

%Identity =40.1 %Similarity = 65.6 

Matches = 135 Mismatches = 112 Conservative Sub.s = 86 

10 

87 117 147 177 207 237 267 297 

LQQDMGKHQSL*TKLSIIFILIEITV*SIQHH**NNYK*N**VYKKGLYILLKK*QSFLFIL*YN*LCRYE*Y*INEARY 

327 357 387 417 444 474 504 534 

15 FMTNTOTPKETAGSFINKVLGGTATAIWALIPNAILATFLKPFLSYG-LAAEFLHIVQVFQFFTPIMAGFLIGQQFKFT 

1= |:| l== 11= 1=1 II === = 11= 1=1 111=1 1=1 II = 

MHTQSLSPRQFMMKIIJStGSSAGIVIGLVPPAIAGELFRALAPLSPLFAALYHWLPIQFSVPALIGTLVGLQFHCS 
10 20 30 40 50 60 70 

20 564 594 624 654 684 714 744 774 

PMQQIAVGGAAY1GSGAWAYTEVIQKGVATGSFQLRGIGDLINMMLTAAIAVIAVKWFGNKFGSLXIILLPIIIGTGVGY 

===111 = l== = 1111=11=11 =111== 1= = 1=111 II 11=1= 1 

APEVATLAFVSVIASG NVTLQNGAWLITGIGDVINVMLISALAIILVRALRGKLGSLTIIALPVIVAWAGG 

90 100 110 120 130 140 

25 

804 834 864 894 924 954 984 1014 

LGWKLLPWSYVTTLIGQGINSFTTLQPIAMSILIAMAFSMLIVSPISTVAIGLAIGLNGMSASAASMGVASTTAVLVWA 

= 1 Mil =1 = = l= I =1 111= 1 = = = ll = = ll = l 1 = 

VGSFSLPYVKMITLFVGRVIATFIALQPLLMSILLSMSFSLIIISPVSSVAVGIAVGLTGLASGAANIGVSSCAMTLIVG 
30 160 170 180 190 200 210 220 

1044 1074 1104 1134 1164 1194 1224 1248 

TMKANKSGVPIAIALGAMKMMMPNFLKHPVmiPMIWTAWSSLTVPLFKLVGTPASSGFGLVGAVGPlASFE--AGASM 

11= 11 111 = 1= ||||| = ||| = : = = 1 = : ||:|= | : I M I I I I I = I I I = = I I I I I = = I I 
35 TMRVimGVPLAMFAGAMKMLMPNWIRYPIimPLLI^GDVCGv^ 

240 250 260 270 280 290 300 

1269 1299 1329 1359 1389 1419 1449 1479 

L- - -IVILSWLVIPFAVGFVSHKICKDILKLYKDDIFVFEGQN*FGGCMLVYIAGSGAMGCRFGYQISKTNHDVILLDNW 

40 : 1: | =:|: | :: I I I I I I = = = I = I 

VRAGILFLVYFVLSFLAAYLIDFILVDRLKLYRRELFIPEQG 
320 330 340 350 

There is also homology to SEQ ID 1276 

45 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1961 

A DNA sequence (GBSx2070) was identified in S.agalactiae <SEQ ID 6077> which encodes the amino 
acid sequence <SEQ ID 6078>. Analysis of this protein sequence reveals the following: 

50 Possible site: 20 

>>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

55 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certalnty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB07127 GB:AP001518 thloredoxin reductase [Bacillus halodurans] 
60 Identities = 163/325 (50%), Positives = 222/325 (68%), Gaps = 3/325 (0%) 
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Query: 5 IYDITIVGGGPVGLFAAFYAGLRGVSVKIIESLSELGGQPAILYPEKKIYDIPGYPVITG 64 

+YDITI+GGGP GLFAAFY G+R VKIIES+ +LGGQ A LYPEK IYD+ G+P + 
Sbjct: 7 LYDITIIGGGPTGLFAAFYGGMRQAKVKIIESMPQLGGQLAALYPEKYIYDVAGFPKVKA 66 

5 

Query: 65 RELIDKHIEQLERFKDSIEICLKEEVLSFEK-VDDVFTIQTDKDQHLSRAIVFACGNGAF 123 

++L++ Q E+F +1 L++ V + K DD FTI+TDK+ H S+AI+ G GAF 
Sbjct: 67 QDLVNDLKRQAEQFNPTI - -ALEQSVQNOTKETDDTFTIKTDKETHYSKAIIITAGAGAF 124 

10 Query: 124 APRLLGLEISEENYADNNLFYNVTKLEQFAGKHVVICGGGDSAVDWANELDKIAASVAIVH 183 

PR L +E + Y NL Y V L +AGK+V+I GGGDSAVDWA L+ +A +V ++H 
Sbjct: 125 QPRRLEVEGAKQYEGKNLQYFVNDLNAYAGKNVI1ISGGGDSAVDWALMLEPVAKNVTI1IH 184 

Query: 184 RRDAFRAHEHSVDILKASGVRILTPYVPIGIiNGDSQRVSSLWQKVKGDEVIELPLDNLI 243 
15 RRD FRAHEHSV++L+ S V ILTP+ L+GD +++ + +Q+VKGD V L +D +1 

Sbjct: 185 RRDKFRAHEHSVELLQKSSVNILTPFAISELSGDGEKIHHVTIQEVKGDAVETLDVDEVI 244 

Query: 244 VSFGFSTSNKNLRYWNLDYKRSSINVSSLFETTQEGVYAIGDAANYPGKVELIATGYGEA 303 
V+FGF +S ++ W L+ +++SI V++ ET G+YA GD YPGKV+LIATG+GEA 
20 Sbjct: 245 WFGFVSSLGPIKGWGLEIEKNSIVVNTKMETNIPGIYAAGDICTYPGKVKLIATGFGEA 304 

Query: 304 PVAINQAINYIYPDRDNRWHSTSL 328 

P A+N A +1 P HSTSL 
Sbjct: 305 PTAVNNAKAFIDPTARVFPGHSTSL 329 

25 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6079> which encodes the amino acid 

sequence <SEQ ID 6080>. Analysis of this protein sequence reveals the following: 

Possible site: 20 
>>> Seems to have an uncleavable N-term signal seq 
30 INTEGRAL Likelihood = -0.37 Transmembrane 8 - 24 ( 8 - 24) 

■ Final Results 

bacterial membrane — Certainty=0. 1150 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

35 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB15201 GB:Z99120 similar to thioredoxin reductase [Bacillus subtilis] 
Identities = 173/328 (52%) , Positives = 223/328 (67%) , Gaps = 4/328 (1%) 

Query: 4 KAYDITI IGGGPIGLFAAFYAGDRG VTVKI IESLSELGGQPAILYPEKMI YDI PAYPSLT 63 

K YDITI IGGGP+GLF AFY G+R +VKIIESL +LGGQ + LYPEK IYD+ +P + 
Sbjct: 6 KVYDITI IGGGPVGLFTAFYGGMRQASVKI IESLPQLGGQLSALYPEKYIYDVAGFPKIR 65 

45 Query: 64 G VELTENLI KQLSRFEDRTTI CLKEE VLTFDKVKGG- FS IRTNKAEHFSKAI I IACGNGA 122 

EL NL +Q+++F+ TICL++ V + +K GF+ K K I GNGA 

Sbjct: 66 AQELINNLKEQMAKFDQ- -TICLEQAVESVEKQADGVFKLVQMKKPTTLKRSCITAGNGA 123 

Query: 123 FAPRTLGLESEENFADHNLFYNVHQLDQFAGQKVVICGGGDSAVDWALALEDIAESvTVV 182 
50 F PR L LE+ E + NL Y V L +FAG++V I GGGDSAVDWAL LE IA+ V+++ 

Sbjct: 124 FKPRKLELENAEQYEGKNLHYFVDDLQKFAGRRVAILGGGDSAVDWALMLEPIAKEVSII 183 

Query: 183 HRRDAFRAHEHSVELLKASTVNLLTPWPKALKGIGNLAEKLVIQKVKEDEVLELELDSL 242 
HRRD FRAHEHSVE L AS VN+LTP+VP L G + E+LV+++VK D LE+D L 
55 Sbjct: 184 HRRDKFRAHEHSVENLHASKVNVLTPFVPAELIGEDKI-EQLVLEEVKGDRKEILEIDDL 242 

Query: 243 IVSFGFSTSNKNLKNWNLDYKRSSITVSPLFQTSQEGIFAIGDAAAYNGKVDLIATGFGE 302 

IV++GF +S +KNW LD +++SI V +T+ EG FA GD Y GKV+LIA+GFGE 
Sbjct: 243 IVNYGFVSSLGPIKNWGLDIEKNSIVVKSTMETNIEGFFAAGDICTYEGKVNLIASGFGE 302 



40 



60 



Query: 303 APTAVNQAINYIYPDRDNRWHSTSLID 330 

APTAVN A Y+ P + +HSTSL + 
Sbjct: 303 APTAVNNAKAYMDPKARVQPLHSTSLFE 330 



65 An alignment of the GAS and GBS proteins is shown below. 
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Identities = 242/324 (74%) , Positives = 279/324 (85%) 



Query : 


D 




65 










Sb j ct : 


6 


YDITIIGGGPIGLFAAFYAGLRGVTVKIIESLSELGGQPAILYPEKMIYDIPAYPSLTGV 


65 


Query: 


66 


ELIDKHIEQLERFKDSIEICLKEEVLSFEKVDDVFTIQTDKDQHLSRAIVFACGNGAFAP 


125 






EL + I+QL RF+D I CLKEE VL+F+KV F+I+T+K *H S+AI+ ACGNGAFAP 




Sb j ct : 


66 


ELTENLIKQLSRFEDRTTICLKEEVLTFDKVKGGFSIRTNKAEHFSKAI I IACGNGAFAP 


125 


Query: 


126 


RLLGLENEENYADNNLFYim'KLEQFAGKHWICGGGDSAVDWANELDKIAASVAIVHRR 


185 






R LGLE+EEN+AD+NLFYNV +L+QFAG+ WI CGGGDSAVDWA L+ IA SV +VHRR 




Sbjct: 


126 RTLGLESEENFADHNLFYNVHQLDQFAGQKWICGC^DSAVIlWALALEDIAESVTvVHRR 


185 


Query: 


186 


DAFRAHEHSVD1LKASGVRILTPYVPIGLNGDSQRVSSLWQKVKGDEVIELPLDNLIVS 


245 






DAFRAHEHSV++LKAS V +LTPYVP L G LV+QKVK DEV+EL LD+LIVS 




Sb j ct : 


186 


DAFRAHEHSvELLKASTVlSrLLTPWPKALKGIGNIAEKLVIQmaiDEVLELELDSLIVS 


245 


Query: 


246 


FGFSTSNKNLRYWNLDYKRSSINVSSLFETTQEGVYAIGDAANYPGKVEliIATGYGEAPV 


305 






FGFSTSNKNL+ WNLDYKRSSI VS LF+T+QEG++AIGDAA Y GKV+LIATG+GEAP 




Sb j ct : 


246 


FGFSTSNKNLKNWNLDYKRSSITVSPLFQTSQEGIFAIGDAAAYNGKVDLIATGFGEAPT 


305 


Query: 


306 


AINQAINYIYPDRDNRWHSTSLI 329 








A+NQAINYIYPDRDNRWHSTSLI 




Sb j ct : 


306 AWQAINYIYPDRDNRVVHSTSLI 329 





SEQ ID 6078 (GBS178) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 38 (lane 5; MW 37.4kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 41 (lane 8; MW 62.4kDa). 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1962 

A DNA sequence (GBSx2071) was identified in S.agalactiae <SEQ ID 6081> which encodes the amino 
acid sequence <SEQ ID 6082>. This protein is predicted to be tRNA methyltransferase (trmD). Analysis of 
this protein sequence reveals the following: 

Possible site: 25 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1496 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06198 GB:AP001515 tRNA methyltransferase [Bacillus halodurans] 
Identities = 144/246 (58%) , Positives = 186/246 (75%) , Gaps = 6/246 (2%) 

Query: 2 MKIDILTLFPEMFAPLEHS - IVGKAKERGLLEINYHNFRENAE - KSRHVDDEPYGGGQGM 59 

MKID LTLFPEMF + HS 1+ +A+ERG + NFRE +E K + VDD PYGGG GM 

Sbjct: 1 MKIDFLTLFPEMFQGVLHSSILKQAQERGAVSFRVVNFREYSENKHKKVDDYPYGGGAGM 60 

Query: 60 LLRAQPIFDTIDKTDAQKA RVILLDPAGRTFDQDFAEELSKEDELIFICGHYEGYDE 116 

+L QP+FD ++ + + + RVIL+ P G TF Q AEEL++ + LI +CGHYEGYDE 
Sbjct: 61 VLSPQPLFDAVEDLTKKSSSTPRVILMCPQGETFTQRKAEELAQAEHLILLCGHYEGYDE 120 

Query: 117 RIKS-LOTDEVSLGDFVLTGGELAAMTDWDATVRLIPEVIGKETSHQDDSFSSGLLEYPQ 175 

RI+S LVTDE+ S +GD+ VLTGGEL AM + D+ RL+P V+G ETS Q DSFS+GLLEYPQ 
Sbjct: 121 RIRSYLVTDELSIGDYVLTGGELGAMVIADSVTRLLPAVLGNETSAQTDSFSTGLLEYPQ 180 
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Query: 176 YTRPYDYLGMOTPDVLMSGHHENIRKHRLEQSLRKTLERRPDLLENYAMTDEERLILEKI 235 

YTRP D+ G VPDVL+SGHH+NI +WR EQSL++TLERRPDLLE +T+EE+ +L+ I 
Sbjct: 181 .YTRPADFRGWKVPDVLLSGHHQNIERWRKEQSLKRTLERRPDLLEGRKLTEEEQELLDSI 240 

5 Query: 236 KTEIER 241 

+ + E+ 
Sbjct: 241 RKQQEK 246 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6083> which encodes the amino acid 
10 sequence <SEQ ID 6084>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

>» Seems to have no N- terminal signal sequence 

Final Results 

15 bacterial cytoplasm Certainty=0 . 2705 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

20 Identities = 195/240 (81%) , Positives = 224/240 (93%) 

Query: 2 MKIDILTLFPEMFAPLEHSIVGKAKERGLLEINYHNFRENAEKSRHVDDEPYGGGQGMLL 61 

MKI DI LTLFPEMFAPLEHS I VGKAKE+GLL+ 1 +YHNFR+ AEK+RHVDDEPYGGGQGMLL 
Sbjct: 1 MKIDILTLFPEMFAPLEHSIVGKAKEKGLLDIHYHNFRDYAEKARHVDDEPYGGGQGMLL 60 

25 

Query: 62 RAQPIFDTIDKIDAQKARVILLDPAGRTFDQDFAEELSKEDELIFICGHYEGYDERIKSL 121 

RAQPIFDTI++I+A+K R+ILLDPAG+ F Q +AEEL+ E+ELIFICGHYEGYDERIK+L 
Sbjct: 61 RAQPIFDTIEQIEAKKPRIILLDPAGKPFTQAYAEELALEEELIFICGHYEGYDERIKTL 120 

30 Query: 122 VTDEVSLGDFVLTGGElAAtmWDATVRIjIPEVIGKETSHQDDSFSSGLLEYPQYTRPYD 181 

VTDE+SLGDFVLTGGEIjAAMTMvIffiTvRLIP+V+GKE+SHQDDSFSSGLLEYPQYTRPYD 
Sbjct: 121 VTDEISI^DFVLTGGELAAMTMVDATVMjIPQVLGKESSHQDDSFSSGLLEYPQOT 180 

Query: 182 YLGMTVPDVLMSGHHENIRKWRLEQSLRKTLERRPDLLENYAMTDEERLILEKIKTEIER 241 
35 Y GMTVPDVLMSGHHE IR WRLE+SL+KT RRPDLLE+Y ++EER +L+KIK +++ 

Sbjct: 181 YRGMTVPDVLMSGHHERIRLWRLEESLKKTYLRRPDLLEHYNFSEEERICLLDKIKEALDQ 240 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

40 Example 1963 

A DNA sequence (GBSx2072) was identified in S.agalactiae <SEQ ID 6085> which encodes the amino 
acid sequence <SEQ ID 6086>. This protein is predicted to be 16S rRNA processing protein. Analysis of 
this protein sequence reveals the following: 

Possible site: 43 
45 >>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.71 Transmembrane 32 - 48 ( 32 - 52) 

Final Results 

bacterial membrane Certainty=0 .2084 (Affirmative) < suco 

50 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9381> which encodes amino acid sequence <SEQ ID 9382> 
was also identified. 

55 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB13475 GB:Z99112 similar to hypothetical proteins [Bacillus subtilis] 
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Identities = 88/174 (50%) , Positives = 128/174 (72%) , Gaps = 1/174 (0%) 

Query: 54 OTMEYFNVGKIVNTQGLQGEMRVLSVTDFVEERFKKGQVI^ 113 

+T +FNVGKIVNT G++GE+RV+S TDF' EER+K G L LF + +++ + +HR 

Sb j Ct : 1 MTKRWFNVGKIVOTHGIKGEVRVISKTDFAEERYKPGOTLYLFMDGRlffiPVEVTvNrHRL / 6 0 

Query: 114 QKNFDIIKFKGMYHINDIEKYKGFTLKVAEDQLSDLKDGEFYYHEIIGLDVYEGE-ELIG 172 

K F +++FK ++N++E+ K +KV E++L +L +GEFY+HEIIG +V+ E ELIG 
Sbjct: 61 HKQFHLLQFKERQNLNEVEELKNAIIKVPEEELGELNEGEFYFHEIIGCEVFTEEGELIG 120 

Query: 173 KI KE I LQPGANDVWWERHGKRDLLLPYI PPVVLEVDLSNQRVQ VELMEGLDDE 226 

K+KEIL PGANDVWV+ R GK+D L+PYI W +D+ +++++ELMEGL DE 
Sbjct: 121 KVKEILTPGANDVWIGRKGKKDALIPYIESVVKHIDVREKKIEIELMEGLIDE 174 

15 A related DNA sequence was identified in S. pyogenes <SEQ ID 6087> which encodes the amino acid 
sequence <SEQ ID 6088>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

»> Seems to have no N-terminal signal sequence 

20 -- Final Results 

bacterial cytoplasm — Certainty=0. 2787 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

25 An alignment of the GAS and GBS proteins is shown below. 

Identities = 133/172 (77%) , Positives = 153/172 (88%) 

Query: 56 MEYFNVGKIVOTQGLQGE^VLSVTDFvEERFKKGQVLALFDEKNQFVMD I E IASHRKQK 115 
MEYFNVGKIVNTQGLQGEMRVLSV+DF EERFKKG LALFD+K++FV ++ I SHRKQK 
30 Sbjct: 1 ^YFNVGKIVNTQGLQGEMRVLSVSDFAEERFKKGSQLALFDDKDRFVQEVTIVSHRKQK 60 

Query: 116 NFDI IKFKGTOHINDIEKYKGFTLKVAEDQLSDLKDGEFYYHEI IGLDVYEGEELIGKIK 175 

+FDIIKFK MYHIN IEKYKG+TLKV++D DL++GEFYYH+IIG+ VYE + LIG +K 
Sbjct: 61 HFDIIKFKDMYHINAIEKYKGYTLKVSKDNQGDLQEGEFYYHQIIGMAVYEKDVLIGHVK 120 

35 

Query: 176 EILQPGANDVWWERHGKRDLLLPYIPPWLEVDLSNQRVQVELMEGLDDED 227 

EILQPGANDVW+V+R GKRDLLLPYI PPWL VD+ N+RV VELMEGLDDED 
Sbjct: 121 EILQPGANDWIVKRCGKRDLLLPYIPPVVLNVDVPNKRVDVELMEGLDDED 172 

40 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1964 

A DNA sequence (GBSx2073) was identified in S.agalactiae <SEQ ID 6089> which encodes the amino 
acid sequence <SEQ ID 6090>. This protein is predicted to be similar to E. coli ykfC (11). Analysis of this 
45 protein sequence reveals the following: 

Possible site: 55 

>>> Seems to have no N-terminal signal sequence 

Final Results 

50 bacterial cytoplasm Certainty=0. 3488 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9909> which encodes amino acid sequence <SEQ ID 9910> 
55 was also identified. 



The protein has homology with the following sequences in the GENPEPT database. 
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>GP:AAC38715 GB:AF030367 maturase-related protein [Streptococcus pneumoniae] 
Identities = 366/425 (86%) , Positives = 396/425 (93%) 

Query: 12 MSELLDKILSRNNMLEAYKQVKSNKGSAGINGVTIEQMDDYLHQNWRETKQLIKERSYKP 71 

MS+LLDKILSR NMLEAY QVKSNKGSAGI+G+TIE+MD+YL QNWR TK+LIK+R YKP 
Sbjct: 1 MSKLLDKILSRENMLEAYNQVKSNKGSAGIDGMTIEEMDNYLRQNWRLTKELIKQRKYKP 60 



Query : 


72 


QPVLRVEIPKPNGGVRNLGIPTAmDRMIQQAIVQVLSPLCEKHFSEYSYGF 


131 






QPVL+VEIPKP+GG+R LGIPT MDRMIQQAIVQV+SP+CE HFS+ SYGFRPNRSCE A 




Sbjct: 


61 


QPVLKVEIPKPDGGIRQLGIPTVMDRMIQQAIVQVMSPICEPHFSDTSYGFRPNRSCEKA 


120 


Query: 


132 


I VQLLEYLNDGYEWI VDIDLEKFFDTVPQDRIJMSLvHNIIQDGDTEfaLIRKYLHSGVVIN 


191 






I++LLEYLNDGYEWIVDIDLEKFFDTVPQDRLMSLVHNII+DGDTESLIRKYLHSGV+IN 




Sbjct: 


121 


IMKLLEYLNDGYEWIVTHDLEKFFDTVPQDRLMSLVHNIIEDGDTESLIRKYLHSGVIIN 


180 


Query: 


192 


GQRHKTLVGTPQGGNLSPLLSNIMLNELDKGLEKRGLRFVRYADDCVITVGSEAAAKRVM 


251 






GQR+KTLVGTPQGGNLSPLLSNIMLNELDK LEKRGLRFVRYADDCVITVGSEAAAKRVM 




Sb j ct : 


181 


GQRYKTLVGTPQGGNLSPLLSNIMOTOLDKELEKRGLRFVRYADDCVITVGSEAAAKRVM 


240 


Query: 


252 


HSVSSYIEKRLGLKVNMTKTKIVRPNKLKYLGFGFWKSPKGWKCRPHQDSVQSFKRKLKQ 


311 






+SVS + IEKRLGLKVNMTKTKI RP +LKYLGFGFWKS GWK RPHQDSV+ FK KLK+ 




Sbjct: 


241 


YSVSRFIEKRLGLKVNMTKTKITRPRELKYLGFGFWKSSDGWKSRPHQDSVRRFKLKLKK 


300 


Query: 


312 


LTMRKWSIDLITRIERIJ^WVIRGWINYFSLGNMKSIMTQIDERLRTRIRVIIWKQWKKKA 


371 






LT RKWSIDL RIE+LN IRGWINYFSLGNMKSI + IDERLRTR+R+ 1 IWKQWKKK+ 




Sbjct: 


301 


LTQRKWS IDLTRRIEQLNLS IRGWINYFSLGNMKSIVAS IDERLRTRLRMI IWKQWKKKS 


360 


Query: 


372 


KRLWGLLKLGVARWIADKVSGWGDHYQLVAQKSVLKRAISKPAIAKRGLVSCLDYYLERH 


431 






+RLWGLLKLGV +WIADKVSGWGDHYQLVAQKSVLKRAISKP L KRGLVSCLDYYLERH 




Sb j ct : 


361 


RRLWGLLKLGVPKWIADKVSGWGDHYQLVaQKSVLKRAlSKPVLEKRGLVSCLDYYLERH 


420 


Query: 


432 


ALKVS 436 








ALKVS 




Sbjct: 


421 


ALKVS 425 





No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1965 

A DNA sequence (GBSx2074) was identified in S.agalactiae <SEQ ID 6091> which encodes the amino 
acid sequence <SEQ ID 6092>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -0.37 Transmembrane 7 - 23 ( 7 - 23) 

Final Results 

bacterial membrane Certainty=0 . 1150 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 82 1> which encodes the amino acid 
sequence <SEQ ID 822>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -2.87 Transmembrane 1157 -1173 (1157 -1174) 



Final Results 

bacterial membrane Certainty=0. 2147 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 1031/1064 (96%) , Positives = 1042/1064 (97%) 

5 Query: 1 MRKKQKLPFDKIAIALISTSILl^AQSDIKftWrVTEDTPATEQAVEPPQPIAVSEESPSS 60 

+RKKQKLPFDKLAIAL+STSILLNAQSDIKANTVTEDTPATEQAVE PQP AVSEE+PSS 
Sbjct: 1 LRKKQKLPFDKLAIALMSTSILLNAQSDIKAWTVTEDTPATEQAVETPQPTAVSEEAPSS 60 

Query: 61 KETKTSQTPSDVGETOADDAlfflliAPQAPAKTADTPATSKATIRDLNDPSHVKTLQEKAGK 120 
10 KETKT QTP D ET+ADDANDLAPQAPAKTADTPATSKATIRDLNDPS VKTLQEKAGK 

Sbjct: 61 KETKTPQTPDDAEETIADDANDLAPQAPAKTADTPATSKATIRDLNDPSQVKTLQEKAGK 120 

Query: 121 GVGTWAVIDAGFDKNHEAWRLTDKTKARYQSKENLEKAKKEHGITYGEWVNDKVAYYHD 180 
G GTWAVIDAGFDKNHEAWRLTDKTKARYQSKE+LEKAKKEHGITYGEWVNDKVAYYHD 
15 Sbjct: 121 GAGTWAVIDAGFDKlfflEAWRLTDKTKARYQSKEDLEKAKKEHGITYGEWVNDKVAYYHD 180 

Query: 181 YSKDGKNAVDQEHGTHVSGILSGNAPSEMKEPYRLEGAMPEAQLLLMRVEIVNGIADYAR 240 

YSKDGK AVDQEHGTHVSGILSGNAPSE KEPYRLEGAMPEAQLLLMRVEIVNGLADYAR 
Sbjct: 181 YSKDGKTAVDQEHGTHVSGILSGNAPSETKEPYRLEGAMPEAQLLLMRVEIVNGLADYAR 240 

20 

Query: 241 NYAQAIRDAWLGAro/INMSFGNAALAYANriPDETKKAFDYAKSKGVSIVTSAGNDSSFG 300 

NYAQAI DAWLGAKVINMSFGNAALAYANLPDETKKAFDYAKSKGVS I VTSAGNDSSFG 
Sbjct: 241 NYAQAI I DAVNLGAKVINMS FGNAALAYANLPDETKKAFDYAKSKGVS I VTSAGNDS S FG 300 

25 Query: 301 GKPRLPLADHPDYGWGTPAAADSTLTVASYSPDKQLTETATVKTDDHQDKEMPVLSTNR 360 

GK RLPLADHPDYGWGTPAAADSTLTVASYSPDKQLTETATVKT D QDKEMPVLSTNR 
Sbjct: 301 GKTRLPIADHPDYGWGTPAAWDSTLTVASYSPDKQLTETATVKTADQQDKEMPVLSTNR 360 

Query: 361 FEPNKAYDYAYANRGTKEDDFKDVEGKIALIERGDIDFKDKIANAKKAGAVGVLIYDNQD 420 
30 FEPNKAYDYAYANRG KEDDFKDV+GKIALIERGDIDFKDKIANAKKAGAVGVLIYDNQD 

Sbjct: 361 FEPNKAYDYAYANRGMKEDDFKDVRGKIALIERGDIDFKDKIANAKKAGAVGVLIYDNQD 420 

Query: 421 KGFPIELPNVDQMPAAFISRRDGLLLKDNPQKTITFNATPKVLPTASGTKLSRFSSWGLT 480 
KGFPIELPNVDQMPAAFISR+DGLLLK+NPQKTITFNATPKVLPTASGTKLSRFSSWGLT 
35 Sbjct: 421 KGFPIELPNVDQMPAAFISRKDGLLLKENPQKTITFNATPKVLPTASGTKLSRFSSWGLT 480 

Query: 481 ADGNIKPDIAAPGQDILSSVANNKYAKLSGTSMSAPLVAGIMGLLQKQYETQYPDMTPSE 540 

ADGNIKPDIAAPGQDILSSVANNKYAKLSGTSMSAPLVAGIMGLLQKQYETQYPDMTPSE 
Sbjct: 481 ADGNIKPDIAAPGQDILSSVANNKYAKLSGTSMSAPLVAGIMGLLQKQYETQYPDMTPSE 540 

40 

Query: 541 RLDLAKKVLMSSATALYDEDEKAYFSPRQQJSAGAVDAKKASAAT^ 600 

RLDLAKKVLMSSATALYDEDEKAYFSPRQQGAGAVDAKKASAATMYOTDKDOT'SSKVHI^ 
Sbjct: 541 RLDIAKKAttiMSSATALYDEDEKAYFSPRQQGAGAVDAKKASAATMYVTDKDNTSSKVHLN 600 

45 Query: 601 NVSDKFEVTVTVHNKSDKPQELYYQVTVQTDKVDGKHFALAPKALYETSWQKITIPANSS 660 

NVSDKFEVTVTVHNKSDKPQELYYQ TVQTDKVDGK FALAPKALYETSWQKITIPANSS 
Sbjct: 601 NVSDKFEVTVTVHNKSDKPQELYYQATVQTDKVDGKLFALAPKALYETSWQKITIPANSS 660 

Query: 661 KQVTVPIDASRFSKDLLAQMKNGYFLEGFVRFKQDPTKEELMSIPYIGFRGDFGNLSALE 720 
50 KQVT+PID S+FSKDIiLA MKNGYFLEGFVRFKQDPTKEELMSIPYIGFRGDFGNLSALE 

Sbjct: 661 KQVTIPIDVSQFSKDLLAPMKNGYFLEGFVRFKQDPTKEELMSIPYIGFRGDFGNLSALE 720 

Query: 721 KPIYTJSKDGSSYYHEANSDAKDQLDGDGLQFYALKNNFTALTTESNPWTI I KAVKEGVEN 780 
KPIYDSKDGSSYYHEANSDAKDQLDGDGLQFYALKNNFTALTTESNPWTIIKAVKEGVEN 
55 Sbjct: 721 KPIYDSKDGSSYYHEANSDAKDQLDGDGLQFYALKNNFTALTTESNPWTIIKAVKEGVEN 780 

Query: 781 IEDIESSEITETIFAGTFAKQDDDSHYYIHRHANGKPYAAISPNGDGNRDYVQFQGTFLR 840 

IEDIESSEITETIFAGTFAKQDDDSHYYIHRHANGKPYAAISPNGDGNRDYVQFQGTFLR 
Sbjct: 781 IEDIESSEITETIFAGTFAKQDDDSHYYIHRHANGKPYAAISPNGDGNRDYVQFQGTFLR 840 

60 

Query: 841 NAKNLVAEVLDKEGNWWTSEVTEQVVKNYNNDLASTLGSTO 900 

NAKNLVAEVLDKEGNVVWTSEVTEQVVKNYl^I^TLGSTRFEKTRWDGK+KDGKWAN 
Sbjct: 841 NAKNLVAEVLDKEGNVVWTSEVTEQVVKN^ 900 

65 Query: 901 GTYTYRVRYTPISSGAKEQHTDFDVIVDNTTPEVATSATFSTEDSRLTLASKPKTSQPVY 960 

GTYTYRVRYTPISSGAKEQHTDFDVIVDNTTPEVATSATFSTED RLTLASKPKTSQPVY 
Sbjct: 901 GTYTYRVRYTPISSGAKEQHTDFDVIVDNTTPEVATSATFSTEDRRLTLASKPICrSQPVY 960 
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Queiry: 961 RERIAYTYMDEDLPTTEYISPNEDGTFTLPEEAETMEGATVPLKMSDFTYWEDMAGNIT 1020 

RERIAYTYMDEDLPTTEYI SPNEDGTFTLPEEAETMEGATVPLKMSDFTYWEDMAGNIT 
Sbjct: 961 RERIAYTYMDEDLPTTEYI SPNEDGTFTLPEEMTMEGATVPLKMSDFTYWEDMAGNIT 1020 

5 

Query: 1021 YTPVTKLLEGHSNKPEQDGSDQAPDKKPEAKPEQDGSGQTPDKK 1064 

YTPVTKLLEGHSNKPEQDGSDQAPDKKPE KPEQDGSGQ PDKK 
Sbjct: 1021 YTPVTKLLEGHSNKPEQDGSDQAPDKKPETKPEQDGSGQAPDKK 1064 

10 A related GBS gene <SEQ ID 8941> and protein <SEQ ID 8942> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 10 
McG: Discrim Score: 5.69 
GvH: Signal Score (-7.5): -3.33 
15 Possible site: 25 

>>> Seems to have an uncleavable N-term signal seq 
ALOM program count: 1 value: -0.37 threshold: 0.0 

INTEGRAL Likelihood = -0.37 Transmembrane 7 - 23 ( 7 - 23) 
PERIPHERAL Likelihood = 2.81 508 
20 modified ALOM score: 0.57 

*** Reasoning Step: 3 

Final Results 

25 bacterial membrane Certainty=0 . 1150 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

SEQ ID 8942 (GBS276) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
30 extract is shown in Figure 46 (lane 2; MW 123kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 60 (lane 5; MW 46.5kDa). 

The GBS276-His fusion product was purified (Figure 206, lane 9) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 296), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

35 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1966 

A DNA sequence (GBSx2075) was identified in S.agalactiae <SEQ ID 6093> which encodes the amino 
acid sequence <SEQ ID 6094>. Analysis of this protein sequence reveals the following: 

40 Possible site: 58 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4286 (Affirmative) < suco 

45 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

50 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1967 

A DNA sequence (GBSx2076) was identified in S.agalactiae <SEQ ID 6095> which encodes the amino 
acid sequence <SEQ ID 6096>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-11.15 Transmembrane 19 - 35 ( 11 - 39) 

Final Results 

bacterial membrane — Certainty=0 . 5458 (Affirmative) < suco 

bacterial outside — Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 991 1> which encodes amino acid sequence <SEQ ID 9912> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 6096 (GBS654) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 142 (lane 8 & 10; MW 51.2kDa + lane 9; MW 27kDa). Purified GBS654-GST is 
shown in Figure 245, lane 1 1. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1968 

A DNA sequence (GBSx2077) was identified in S.agalactiae <SEQ ID 6097> which encodes the amino 
acid sequence <SEQ ID 6098>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 4174 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9913> which encodes amino acid sequence <SEQ ID 9914> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF27324 GB:AF178424 unknown [Lactococcus lactis] 
Identities = 26/75 (34%) , Positives = 45/75 (59%) , Gaps = 4/75 (5%) 

Query: 11 MAFEPKNSELTKVLKES-LDEEKKEIFSSEMNIRDFERTKQYQFTLQPSVRKKIDRLSKE 69 

MAF+ +++VLSL + KE+ I EKY FTL+PSV++ +++L+++ 

Sbjct: 1 MAFDvDDKKVKTVLSNSSLAKSKVEL PKKIESEENKKSYSFTLEPSVKEGLEKLAEK 57 

Query: 70 KGYRSASSFINDFFK 84 

+ Y++ S F+ND K 
Sbjct: 58 QNYKNTSQFLNDLIK 72 

No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could he useful antigens for 
vaccines or diagnostics; 

Example 1969 

A DNA sequence (GBSx2078) was identified in S.agalactiae <SEQ ID 6099> which encodes the amino 
5 acid sequence <SEQ ID 610O. This protein is predicted to be ParA. Analysis of this protein sequence 
reveals the following: 

Possible site: 45 

»> Seems to have an uncleavable N-term signal seq 

10 Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF27325 GB:AF178424 ParA [Lactococcus lactis] 
Identities = 49/104 (47%) , Positives = 72/104 (69%) 

Query: 22 LSERLEEFKTEAFDFKTRASYVTAKLFFLGNMI KHNTNSSKELIRSLKNDKS VLAMI PHK 81 
20 L ERL+ FK E D +TR +Y+TA +F+GN I+HNT SS+E + DK +AMIP K 

Sbjct: 157 LIERLQNFKDEVIDARTRETYITAIPYFVGNRIRHNTKSSREFSEKISQDKGTIAMIPEK 216 

Query: 82 ELFNRSTLDKKSLSYMMSDKELYSRDSKFFKEIDFTFRKITDKL 125 
ELFNRSTLD L M DK++++ + F+++++F F +IT+K+ 
25 Sbjct: 217 ELFNRSTLDGVPLVEMEKDKDVFNSNKVFYEKLNFAFNEITOKI 260 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

30 Example 1970 

A DNA sequence (GBSx2079) was identified in S.agalactiae <SEQ ID 6101> which encodes the amino 
acid sequence <SEQ ID 6102>. This protein is predicted to be transposase (orfA). Analysis of this protein 
sequence reveals the following: 

Possible site: 42 
35 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2830 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1971 

45 A DNA sequence (GBSx2080) was identified in S.agalactiae <SEQ ID 6103> which encodes the amino 
acid sequence <SEQ ID 6104>. This protein is predicted to be transposase (orfB). Analysis of this protein 
sequence reveals the following: 

Possible site: 16 
' >» Seems to have no N-terminal signal sequence 
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10 



15 



Final Results 

bacterial cytoplasm Certainty=0. 2618 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB90834 GB:AJ250837 putative transposase [Streptococcus dysgalactiae] 
Identities = 242/259 (93%) , Positives = 249/259 (95%) 



MCRWLN+P SSYYY+AVE VSE E EE+IK IFL+S++RYGSRKIKICLNNEGITLSRRR 



IRRIMKRLNLVS VYQKATFKPHSRGKNEAP I PNHLDRQFK ERPLQAL VTDLTYVRVGNR 



Query: 


1 


Sb j ct : 


1 


Query: 


61 


Sb j ct : 


61 


Query: 


121 


Sb j ct : 


121 


Query: 


181 


Sb j ct : 


181 


Query: 


241 


Sb j ct : 


241 



20 WAYVCLI I DLYNRE I IGLSLGWHKTAEL VKQAIQS I PY LTKVKMFHSDR KEF+NQLID 



EILEAFGITRSLSQAGCPYDNAVAESTYRAFKIEFVYQETFQ LEELALKTK YVHWWNY 

25 

Query: 241 

HRIHGSLNYQTPMTKRLIA 

Sbict: 241 

30 

There is also homology to SEQ ID 32. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1972 

35 A DNA sequence (GBSx2081) was identified in S.agalactiae <SEQ ID 6105> which encodes the amino 
acid sequence <SEQ ID 6106>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

>>> Seems to have no N- terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0. 3325 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1973 

50 A DNA sequence (GBSx2082) was identified in S.agalactiae <SEQ ID 6107> which encodes the amino 
acid sequence <SEQ ID 6108>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

»> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0 .4442 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

5 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9917> which encodes amino acid sequence <SEQ ID 9918> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

10 >GP:AAD44095 GB:AF115103 orf359 gp [Streptococcus thermophilus 

bacteriophage Sfi21] 
Identities = 92/357 (25%) , Positives = 162/357 (44%) , Gaps = 33/357 (9%) 

Query: 45 RKNQYGKTFETMKEAYDELTOIKYEFANKVSLENyNMTFENYMNKIYLRAYKQK-VQSVT 103 
15 RK + FT EA ++ + + V+++ ++T +Y K + YK+ V +T 

Sbjct: 24 RKPKTKGGFRTKSEAIKAAAEMELKLQDNVNVDE-DITLYDYF-KQWCEVYKKPTVSKIT 81 

Query: 104 YKTALPHHKLFIQYFGLKPLKAITPRDCEAFRLHIIENYSENYAKNLWSRF KACMG 159 

YK + + +FG K LK+IT + + ++ +Y++ +A++ RF KAC+ 

20 Sbjct: 82 YKAYINSQRKIELFFGDKKLKSITATEYQ RVLNSYAKTHAQDTVERFNVHVKACIE 137 

Query: 160 YAERLGYISNMPCKALD NPRGKHPETPFWTYAEFQTFIKSFDLHDYEELQRFTAIWL 216 

A GYI CK +G+ ET F E++ I ++ + E + A+++ 

Sbjct: 138 MAVHEGYIKRNFCKFAKINAKNKGRDIETKFLEVEEYERLI - - YETSKHPEYASYAALYI 195 

25 

Query: 217 YYMTGVRVSEGLSLCWEDIDFDKKFLKVHTTLEKDENGNWYRKDQTKTPAGERLIELDDI 276 

TG+R +E L L +DI D L V+ T + N + TKT + R I LDD 

Sbjct: 196 IAKTGIRFAECLGLTVDDIKRDTGMLSWKTWDYKNNTGFM PTKTKSSIREIPLDDE 252 

30 Query: 277 TIEVLQVWRKNQFANQDTDFIISRFGDPFCKSTICRIIKRKAQQVGVPVITGKGLRHSHA 336 
I + +Q D 1+ + T+ +1+ R+ + LRH++A 

Sbjct: 253 FINFI DQLPPTDDGRILPSLSNNAVNKTLRKIVGRE VRVHSLRHTYA 299 

Query: 337 S YLINVLKKD I LYVARRMGHADKSTTLNTYSHWFNALDKTVSEE I TQNI KSAGLDS I 393 
35 SYLI D++ V++ +GH + + TL Y+H E+I Q G +++ 

Sbjct: 300 SYLI-AHDIDLISVSQVLGHENI^ITLEWAHQLQEQKSRNDEKIKQMWTECGRNAL 355 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6109> which encodes the amino acid 
sequence <SEQ ID 61 10>. Analysis of this protein sequence reveals the following: 

40 Possible site: 61 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 5549 (Affirmative) < suco 

45 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 111/127 (87%) , Positives = 119/127 (93%) 

Query: 242 LKVHTTLEKDENGNWYRKDQTKTPAGERLIELDDITIEVLQVWRKNQFANQDTDFIISRF 301 

LKVHTTLEKDENGNWYRKDQTKTPAGERLIEIiDD+TI VL+ WR+NQ N DTDFIISRF 
Sbjct: 1 LKVHTTLEKDENGNWYRKDQTKTPAGERLIELDDVTIVVLENWRRNQVVNTDTDFIISRF 60 

55 Query: 302 GDPFCKSTICRIIKRKAQQVGVPVITGKGLRHSHASYLINVLKKDILYVARRMGHADKST 361 

G+PFCKSTICR+IK KAQ +GVPVITGKGLRHS+ASYLINVLKKDILYVA+ MGHADKST 
Sbjct: 61 GEPFCKSTICRVIKHKAQSIGVPVITGKGLRHSYASYLINVLKKDILYVAKCMGHADKST 120 

Query: 362 TLNTYSH 368 
60 TLNTYSH 

Sbjct: 121 TLNTYSH 127 



50 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1974 

A DNA sequence (GBSx2083) was identified in S.agalactiae <SEQ ID 6111> which encodes the amino 
5 acid sequence <SEQ ID 61 12>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

>>> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0 . 3299 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

15 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1975 

A DNA sequence (GBSx2084) was identified in S.agalactiae <SEQ ID 6113> which encodes the amino 
20 acid sequence <SEQ ID 61 14>. This protein is predicted to be repressor protein-related protein. Analysis of 
this protein sequence reveals the following: 
Possible site: 32 

>» Seems to have no N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm Certainty=0. 2721 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

30 A related GBS nucleic acid sequence <SEQ ID 9919> which encodes amino acid sequence <SEQ ID 9920> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC98432 GB:L29324 repressor protein [Streptococcus pneumoniae] 
Identities = 38/65 (58%) , Positives = 52/65 (79%) , Gaps = 1/65 (1%) 

35 

Query: 2 MYRRLRDLREDNDFTQKYVAEK-LSFTHSAYSKIERGERILSADVIIKLSNLYNVSTDYL 60 

M +R+RDLRED+D TQ+YVA+ L+ T SAYSK+E G R++S D +IKL++ YNVS DYL 
Sbjct: 1 MLKRIRDLREDDDLTQEYVAKTILNCTRSAYSKMESGTRLISIDDLIKLADFYNVSLDYL 60 

40 Query: 61 LGQTD 65 

+G+ D 
Sbjct: 61 VGRVD 65 

There is also homology to SEQ ID 582. 

45 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1976 

A DNA sequence (GBSx2085) was identified in S.agalactiae <SEQ ID 6115> which encodes the amino 
acid sequence <SEQ ID 611 6>. This protein is predicted to be relaxase. Analysis of this protein sequence 
reveals the following: 

Possible site: 40 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3160 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC98434 GB:L29324 relaxase [Streptococcus pneumoniae] 
Identities = 223/417 (53%) , Positives = 310/417 (73%) , Gaps = 5/417 (1%) 



Query: 


1 


MVITKHYAVHGKKYRRQLIKYILDPKKTRNLSLISDFGMSNYLDFPDYVELVKMYQNNFL 


60 






MVITKH+A+HGK YR +LIKYIL+P KT+NL+L+SDFGM NYLDFP Y ELVKMY +NFL 




Sbj ct : 


1 


mvitkhfaihgknyrsklik^ii^psktknltlvsdfgmrnyldfpsykelvki™dnfl 


60 


Query: 


61 


SNDQLYDSRFDRQEKKQQKIHAHHIIQSFSPEDKLSPEEINRIGYETIKELIGGQYKFIV 


120 






SND LY+ R DRQE Q+KIH+HHIIQSFSP+D L+PE+INR1GYE KEL GG+++FIV 




Sbj ct : 


61 


SNDTLYEFRHDRQEvNQRKIHSHHIIQSFSPDDHLTPEQINRIGYEAAKELTGGRFRFIV 


120 


Query: 


121 


ATHVDQDHCHNH 1 1 INS INSQSQKKLKWDYALERNLQM I SDRI S KVAGAKI I PPKRYSHR 


180 






ATHVD+ H HNHII+NSI+ S KK WDY E NL+M+SDR+SK+AGAKII RYSHR 




Sbj ct : 


121 


ATHVDKGHIHNHI ILNS IDQNSDKKFLWDYKAEHNLRMVSDRLSKIAGAKI I -ENRYSHR 


179 


Query: 


181 


DYEVYRRSNHKYELKQRLFFLTffiHSIDFNDFMQKAEQLNVKIDFSRKHSRFFMTDRNMKQ 


240 






YEVYR+ +N+ KYE+KQR+ + FL+E+S +F D +KA+ L++KIDF KH +FMTD NMKQ 




Sbj ct : 


180 


QYEVYRKTNYK^EIKQRVYFLIENSKNFEDLKKKAKALHLKIDFRHKHVTYFMTDSNMKQ 


239 


Query: 


241 


VICGDKUJKREPYSKEYFQRYFAKKKIELILEFLLLRSNSFDDLVEKARLLGLELKSKKK 


300 






V++ KL++++PY++ YF++ F +++I ILEFLL + + ++L+++A + GL++ K+K 




Sbjct: 


240 


WRDSKLSRKQPYNETYFEKKFVQREI INILEFLLPKMKNMNELIQRAEVFGLKI I PKEK 


299 


Query: 


301 


TIDFVLSDGKSCISIPNKSLRKKNLYDTTYFDSYFKEHDVFEVLHNNEVKIEFEKFETQQ 


360 






+ F DG I++LK NLY +YF YF + VLN + + + + + 




Sbjct: 


300 


HVLFEF - DG 1 KLAEQEbVKSNLYSVS YFQDYFNNKNETFVLDNKNL VELYNEEKI I K 


355 


Query: 


361 


LSEILTVEEITEAYETYKTKRDAVHEFEVEITEEQIEKIVLDGLFVKVWMGIGQEGL 417 






E+ + E + ++Y+ +K RDAVHEFEVE+ QIE++V G+++KV GI ++ L 




Sbj ct : 


356 


EKELPSEEMVWKSYQDFKRNRDAVHEFEVELNLNQIEEVVEHGIYIKVQFGIDKKDL 412 



A related DNA sequence was identified in S. pyogenes <SEQ ID 6117> which encodes the amino acid 
sequence <SEQ ID 61 18>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3114 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 103/218 (47%) , Positives = 170/218 (77%) 

Query: 393 EEQIEKIVLDGLFVKVWMGIGQEGLIFIPNHQIiNILEQENKKQYQVFIRETSSYFIYHKE 452 

E QIE+++ + +++KV + Q GLIFIPN+QL+I ++EN K+Y+V+IRET+ +FIY+KE 
Sbjct: 2 EHQIERLIAEDIYIKVSFSVKQSGLIFIPNYQLDIRKEENHKKYKVYIRETAQFFIYNKE 61 
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Query: 453 DSEMNRFMKGRDLIRQLTFDNKSLPYKRRISLVSLQQKIEEINLLMTLNIQNKSFLELKD 512 

SE+NR+M+G +LI QLT D+KS+P +RR ++ +L++KIEEI+LL+ L+ +NK + ++KD 
Sbjct: 62 ASELNRYMRGHELICQLXNDSKSIPKRRRQTIDTLKKKIEEISLLIEIiDTENKPYQDIKD 121 

5 Query: 513 ELVGDIAQLDIELTNLQDKOTTJ^KMAEVVv^ 572 

++V D+AQLD+ +T LQD LNK+AEV++NL +++ + ++LA+Y+ +KMNL+ + I 
Sbjct: 122 DIVKDMAQLDLTITELQDHIAHMKVAEVL)^ 181 

Query. 573 QIESEIEMIQNQLDNKIEEYENAVRKLDEYVRVLNMDK 610 
10 ++E EIE QN+L+ I+EYE VR+L+++ +L+ K 

Sbjct: 182 EVEKEIETSQNEIjNISIDEYEYLVRRLEKFGEILSDSK 219 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

15 Example 1977 

A DNA sequence (GBSx2086) was identified in S.agalactiae <SEQ ID 6119> which encodes the amino 
acid sequence <SEQ ID 6120>. Analysis of this protein sequence reveals the following: 



20 



25 



Possible site: 40 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 4006 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC98436 GB:L29324 unknown [Streptococcus pneumoniae] 
Identities = 53/115 (46%) , Positives = 77/115 (66%) , Gaps = 2/115 (1%) 

30 Query: 5 VREIRKEVNFSIEEYQQIQNFMEQEGYEQFSPFARGKLLKIDHQPSQQLEEWIKYLQHQK 64 

+R IRK+ + E +QI + M ++G + FS F R LL D Q +Q+E+W + QK 
Sbjct: 5 IRS IRKQFRLTETEEKQILDLMREKGDDNFSDFLRKSLLLSDGQ - - KQMEKWFNLWKKQK 62 

Query: 65 VEQIYRDTOEILVLAKLSQSVTMEHLEIILTCIKDLMKEIEVTIPLSYSFKDKYM 119 
35 +EQI RDVHE+ ++AK + VT EH+ I+LTCI++L+KE+E T PLS F +KYM 

Sbjct: 63 LEQISRDVHEWIIAKTNHQVTHEHVSILLTCIQELIKEVEKTGPLSEDFCNKYM 117 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
40 vaccines or diagnostics. 

Example 1978 

A DNA sequence (GBSx2087) was identified in S.agalactiae <SEQ ID 6121> which encodes the amino 
acid sequence <SEQ ID 6122>. This protein is predicted to be TnpA. Analysis of this protein sequence 
reveals the following: 

45 Possible site: 34 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 293 5 (Affirmative) < suco 

50 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC82523 GB:AF027768 TnpA [Serratia marcescens] 
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Identities .= 176/413 (42%) , Positives = 243/413 (58%) , Gaps = 18/413 (4%) 



Query: 


zo 




Q A 






M F+V+ V P C ECG + + R+ DLPI KRV L + RRRi CK L +T 




Sb j ct : 


1 


MHFQVD-VPDPIACEECGVQGEFVRFGKRDVPYRDLPIHGKRVTLWVWRRYTCRACKTT 


59 


Query: 


85 


Id VJJiiJ^oMiiU<JjJjKbiybybiyiolS.lr VhiVAlioVLjVlJJiKl lKJMVr IsUx VAJ-tiSJiKiL 


138 






VD R MT RL + ++++S + + VA G+DEKT+R++F R 




Sbjct: 


60 


FRPQLPEMVDGFR-MTLRLHEYVEKESFl^P^ 


118 


Query: 


139 


\J7"v |jl L.II 1 mUTiTT /~<"r 1 lU TTTTXTITITITIT T TT rpT.TTTT'T3Tl rpTirRT T/'TYf<TT3'MT/'T^rTITTT^T5T C*T?T OnDTVTDVTr 

iQFEi PKWLCjIDEIHI IRRPRLVLTNIERRTI YD1 KPNRNKh. 1 VlQKlibJilblJKl X Ih, XV 


198 






++FETP+ LGIDE+++ +R R +LTNIE RT+ D+ R ++ V L ++ DR +E V 




Sb j ct : 


119 


HRFETPRI LGIDELYLNKRYRC I LTNI EERTLLDLIATRRQD WTNYLMKLKDRQKVE I V 


178 


Query: 


199 


T^^KPYKDAvl^ILPQAK^ATVDKFHVvRMANQAIiDNvRKSLKAHMSQKERRTLMRERF 


258 






+MDMW PY+ AV +LPQA++WDKFHWRMAN AL+ VRK L+ + + RTL +R 




Sbjct: 


179 


SMD^MPYRAAVKAVI J PQARIVVDKFHWRMANDALERVRKGLRKELKPSQSRTLKGDRK 


238 


Query: 


259 


ILLKRKHDIJsTERESFLLDTWLGNLPALKEAYELKEEFYWIWDTPDPDEGHLRYSQWRHRC 


318 






ILLKR H++++RE +++TW G PL AYE KE FY IWD + +W 




Sb] ct : 


239 


TT T Vn71TTT7T7f1T\TinnT THJII .M 1 lliTm^TV T?T^^T T Ti 7l ^rTTTTT^TTiT") T3\T/~* TT.TT\T\ 1 1 " 1 "TIT /~\TnnT\ T\ T TITTT.T "T T\ m 

ILLKRAHEVSDRERLIMETWTGAFPQLIjAAYEHKERFYGIWDATTRLQAEAAIjDEW-IAT 


297 


Query: 


319 


MSSNSKDAYKDLVRAVDNWHVE I FNYF - - DKRLTNAYTES INS I IRQVERMGRGYS FDAL 


376 






+ K+ + DLVRAV NW E YF D +TNAYTESIN + + R GRGYSF+ + 




Sb j ct : 


29S 


IPKGQKEWSDLVRAVGl!MREETMTYFETDMPVTNAYTESINRLAKDKNREGRGYSFEVM 


357 


Query: 


377 


RAKILFNEKLHKKRKPRFNSSAFNKAMLYDTFNWYEVNDHDITDNLGVDFSTL 429 








RA++L+ K HKK+ P SFK+ Y + D N GVD ST+ 




Sb j ct : 


353 


RARMLYTTK-HKKKAPTAKVSPFYKKTI GYGLPDFAEELNYGVDLSTI 404 





No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1979 

A DNA sequence (GBSx2088) was identified in S.agalactiae <SEQ ID 6123> which encodes the amino 
acid sequence <SEQ ID 6124>. This protein is predicted to be mercuric reductase. Analysis of this protein 
sequence reveals the following: 

Possible site: 53 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2115 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA70224 GB:Y09024 mercuric reductase [Bacillus cereus] 
Identities = 412/546 (75%) , Positives = 484/546 (88%) 

Query: 1 ^KFKVNISGMTCTGCEKHVESALEKIGAKNIESSYRRGEAVFELPDDIEVESAIKAIDE 60 

M K++V++ GMTCTGCE+HV ALE +GA IE +RRGEAVFELP+ + VE+A KAI + 
Sbjct: 1 MKKYRVDVQGMTCTGCEEHVAVALENMGATGIEVDFRRGEAVFELPNALGVETAKKAISD 60 

Query: 61 ANYQAGEIEEVSSLENVALINEDNYDLLIIGSGAAAFSSAIKAIEYGAKVGMIERGTVGG 120 

A YQ G+ EEV S E V L NE +YD +IIGSG AAFSSAI+A++YGAKV MIERGT+GG 
Sbjct: 61 AKYQPGKAEEVQSQEMVQLGNEGDYDYI I IGSGGAAFSSAIEAVKYGAKVAMIERGTIGG 120 

Query: 121 TCTNIGCTPSKTLLRAGEINHLSKDNPFIGLQTSAGEVTJLASLITQKDKLVSELRNQKYM 180 

TCVNIGCVPSKTLLRAGEINHL+K+NPF+GL TSAGEVDLA LI QK++LV+ELRN KY+ 
Sbjct: 121 TCVNIGCTPSKTLLRAGEINHIiAKNNPFVGLHTSAGEVDLAPLIKQKNELOTELRNSKYV 180 
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10 



20 



25 



Query: 


181 


Sb j ct : 


181 


Query: 


241 


Sbjct: 


241 


Query: 


301 


Sbjct: 


301 


Query: 


361 


Sbjct: 


361 


Query: 


421 


Sbjct: 


421 


Query: 


481 


Sbjct: 


481 


Query: 


541 


Sb j ct : 


541 
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DLID+Y F+LI +GEAKFVD TVEVNG +SAKRFLIATGASP+ P I GL ++DYLTST 



+LLELKK+PKRL VIGSGYIGMELGQLFH+LGSE+TL+QRSERLLKEYDPEISESVEK+L 



+EQGINLVKGAT+ER+EQ+G+IK+V+V VNG + +IE+DQLLVATGR PNT +LNL AAG 



15 VE G EI+I+D+ +T+N +IYAAGDVTLGPQFVYVAAY+GG+ NAIGGLNKK++L 



WP VTFT P +ATVGLTE+QAKE GY+VKTSVLPLDAVPRA+VNRETTGVFKLVAD++T 



+KVLG H+V+ENAGDVIYAA+LAVKFGLT+4D+ ETLAPYLTMAEGLKL ALTFDKDISK 



LSCCAG 



30 There is also homology to SEQ ID 1 820. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1980 

A DNA sequence (GBSx2089) was identified in S.agalactiae <SEQ ID 6125> which encodes the amino 
35 acid sequence <SEQ ID 6126>. This protein is predicted to be regulatory protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 36 

»> Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0 .4529 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco ■ 

45 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA83973 GB:AF138877 mercury resistance operon negative 
regulator MerRl [Bacillus sp. RC607] 
Identities = 83/129 (64%) , Positives = 104/129 (80%) 

50 Query: 1 MIYRISEFADKCGVNKETIRYYERKNLLQEPHRTEAGYRIYSYDDVKRVGFIKRIQEFGF 60 

M +RI E ADKCGVNKETIRYYER L+ EP RTE GYR+YS V R+ FIKR+QE GF 
Sbjct: 1 MKFRIGELADKCGVNKETIRYYERLGLIPEPERTEKGYRMYSQQTVDRLHFIKRMQELGF 60 

Query: 61 SLSEIYKLLGVVDKDEVRCQDMFEFVSKKQKEVQKQIEDLKRIETMLDDLKQRCPDEKKL 120 
55 +L+EI KLLGWD+DE +C+DM++F K +++Q++IEDLKRIE ML DLK+RCP+ K + 

Sbjct: 61 TIiNEIDKLLGVVDRDEAKCRDMYDFTILKIEDIQRKIEDLKRIERMLMDLKERCPENKDI 120 

Query: 121 HSCPIIETL 129 
+ CPIIETL 
60 Sbjct: 121 YECPIIETL 129 
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There is also homology to SEQ ID 1712. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

5 Example 1981 

A DNA sequence (GBSx2090) was identified in S.agalactiae <SEQ ID 6127> which encodes the amino 
acid sequence <SEQ ID 6128>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

»> Seems to have no N-terminal signal sequence 
10 INTEGRAL Likelihood = -7.86 Transmembrane 80 - 96 ( 78 - 100) 

Final Results 

bacterial membrane Certainty=0 .4142 (Affirmative) < suco 

bacterial outside Cer taint y=0 . 0000 (Not Clear) < suco 

15 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S. pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
20 vaccines or diagnostics. 

A related GBS gene <SEQ ID 8943> and protein <SEQ ID 8944> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: -13.52 
25 GvH: Signal Score (-7.5): -6.14 

Possible site: 44 
»> Seems to have no N-terminal signal sequence 
ALOM program count: 1 value: -7.86 threshold: 0.0 

INTEGRAL Likelihood = -7.86 Transmembrane 80 - 96 ( 78 - 100) 
30 PERIPHERAL Likelihood = 1.80 136 

modified ALOM score: 2.07 

*** Reasoning Step: 3 

35 Final Results 

bacterial membrane Certainty=0 .4142 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



40 The protein has homology with the following sequences in the databases: 

ORF02021(439 - 666 of 1080) 

GP|451734|gb|AAA18975.l| |U05143 (9 - 46 of 46) envelope glycoprotein {Simian 
immunodeficiency virus} GP | 451744 |gb |AAA18980 . 1 | |U05148 envelope glycoprotein {Simian 
immunodeficiency virus} 
45 %Match =3.2 

%Identity =38.5 %Similarity =64.1 

Matches = 15 Mismatches = 13 Conservative Sub.s = 10 



336 366 396 426 456 486 516 546 

50 RIPVQFKGCDDYYNEOTGYPLSRINLEHYLTEGGVLYFVWSKDVSP^ 

= 11 I : lhl = :||: 1 = 

WGLTGNAGTTPTATTTTTTPRWENVINESN 

10 20 30 
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576 606 636 666 696 726 756 786 

LFWMAIIAKLLILPYPALQTSYKSRPCLRRSSLRKLTQIPFSIVTKVGNTNMKSIT^ 

II:: :| III 
PCIKDNSCAGLEQEP 

40 

SEQ ID 8944 (GBS415) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 79 (lane 3; MW 21.2kDa). 

Example 1982 

A DNA sequence (GBSx2092) was identified in S.agalactiae <SEQ ID 6129> which encodes the amino 
acid sequence <SEQ ID 6130>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3402 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1983 

A DNA sequence (GBSx2093) was identified in S.agalactiae <SEQ ID 6131> which encodes the amino 
acid sequence <SEQ ID 6132>. This protein is predicted to be ATPase. Analysis of this protein sequence 
reveals the following: 

Possible site: 27 

>>> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial membrane Certainty=0 . 5034 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA22858 GB:M90750 cadmium-efflux ATPase [Bacillus firmus] 
Identities = 486/725 (67%) , Positives = 584/725 (80%) , Gaps = 18/725 (2%) 

Query: 1 MSRGKAKQSEKEMKAYRVC^FTCTNC^IFENNVKELPGVQDAKVNFGASKVYVKGTTTI 60 

MS KA SE+EMKAYRVQGFTC NCA FE NVK+L GV+DAKVNFGASK+ V G TI 
Sbjct: 1 MSDQKAITSEQEMKAYRVQGFTCANCaGKFEKNVXQLSGvEDAKVNFGASKIAvYGNATI 60 

Query: 61 EELEKAGAFENLKIRDEKEQRVGGE PFWKQKENI KVYI SALLLWSWFL 109 

EELEKAGAFENLK+ EK R + PF+K K + +Y S LL+ + 

Sbjct: 61 EELEKAGAFENLKVTPEKSARQASQEVKEDTKEDKVPFYK-KHSTLLYAS-LLITFGYLS 118 

Query: 110 GEQYGEEHVTjPTIGYAASILIGGYSLFIKGLKNLRRLNFDMNTLMTIAIIGAAIIGEWGE 169 
GEE+++ T+ + AS+ IGG SLF GL+NL R FDM TLMT+A+IG AIIGEW E 
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' Sbjct: 119 SYWGEENIOTTLLFIASMFIGGLSLFKVGLQNLLRFEFDMKTLMTVAVIGGAIIGEWAE 178 

Query: 170 GATWILFAISEALERYSMDKARQSIESLMDIAPKEALIRRGNEEMMIHVDEIQVGDIMI 229 

A WILFAISFALER+SMD+ARQSI SLMDIAPKEAL++R +E+MIHVD+I VGDIMI 
Sbjct: 179 VAIWILFAISEALERFSMDRARQSIRSLMDIAPKEALVKRNGQEIMIHVDDIAVGDIMI 238 

Query: 230 VKPGQKIAMDGIWKGTSTIMQAAITGESVPVTKITNDEVFAGTLNEEGLLEVKVTKRVE 289 

VKPGQK+AMDG+W G S +NQ AITGESVPV K ++EVFAGTLNEEGLLEV++TK VE 
Sbjct: 239 VKPGQKIAMDGVWSGYSAVNQTAITGESVPVEKTVDNEVFAGTLNEEGLLEVEITKLVE 298 

Query: 290 DTTLSKIIHLVEEAQAERAPSQAFVDKFAKYYTPAIVILALLIAWPPB-FGGDWSQWIY 348 

DTT+SKIIHLVEEAQ ERAPSQAFVDKFAKYYTP I+I+A L+A+VPPL F G W WIY 
Sbjct: 299 DTTISKIIHLVEEAQGERAPSQAFVDKFAKYYTPIIMIIATIjVAIVPPLFFDGSWETWIY 358 

15 Query: 349 QGLAVLWGCPCALWSTPVAWTAIGNAAKNGVLIKGGIHLFAAGHLKAIAFDKTGTLT 408 

QGIAVLWGCPCALV+STP+++V+AIGNAAK GVL+KGG++LE G LKAIAFDKTGTLT 
Sbjct: 359 QGLAVLWGCPCALVISTPISIVSAIGNAAKKGVLVKGGVYLEEMGALKAIAFDKTGTLT 418 

Query: 409 KGIPAVTD- - IVTYGRNENELITITSAIEKGSQHPLASAIMRKAEENGLKFNEVTVEDFQ 466 
20 KG+PAVTD ++ NE EL++I +A+E SQHPLASAIM+KAEE + +++V VEDF 

Sbjct: 419 KGVPAVTDYNVLNKQINEKELLSIITALEYRSQHPLASAIMKKAEEENITYSDVQVEDFS 478 

Query: 467 SITGKGVKAKI^INEMYWGSQNLFEE-LHGSISSDICKEKIADMQTQGKTVMVLGTEKEIL 525 
SITGKG+K +N YY+GS LF+E L D ++ + +Q QGKT M++GTEKEIL 

25 Sbjct: 479 S I TGKGI KG IVNGTTYYIGS PKLFKELLTNDFDKDLEQNVTTLQNQGKTAMI IGTEKE I L 538 

Query: 526 SFIAVADEMRESSKEVIGKLNNMGI-ETVMLTGDNQRTATAIGKQVGVSDIKADLLPEDK 584 

+ IAVADE+RESSKE++ KL+ +GI +T+MLTGDN+ TA AIG QVGVSDI+A+L+P+DK 
Sbjct: 539 AVIAVADEVRESSKEILQKLHQLGIKKTIMLTGDNKGTANAIGGQVGVSDIEAELMPQDK 598 

30 

Query: 585 IOTIKELREKHQSVGMVGDGVNDAPALftASTVGVAMGGAGTDTALETADIALMSDDLSKL 644 

L+FIK+LR ++ +V MVGDGVNDAPALAASTVG+AMGGAGTDTALETAD+ALM DDL KL 
Sbjct: 599 LDFIKQLRSEYGWAMVGDGV1TOAPALAASTVGIAMGGAGTDTALETADVALMGDDLRKL 658 

35 Query: 645 PYTIKLSRKALAIIKQNITFSIAIKLVRLLLVMPGWLTLWIAIFADMGATLLVTLNSLRL 704 

P T+KLSRK L I IK NITF++AIK +A LLV+PGWLTLWIAI +DMGATLLV LN LRL 
Sbjct: 659 PSTVKLSRKTLNIIKANITFAIAIKFIASLLVIPGWLTLWIAILSDMGATLLVALNGLRL 718 

Query: 705 LKIKE 709 
40 +++KE 

Sbjct: 719 MRVKE 723 

There is also homology to SEQ ID 3506. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
45 vaccines or diagnostics. 

Example 1984 

A DNA sequence (GBSx2094) was identified in S.agalactiae <SEQ ID 6133> which encodes the amino 
acid sequence <SEQ ID 6134>. Analysis of this protein sequence reveals the following: 

Possible site: 19 
50 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0779 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

55 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 



No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could he useful antigens for 
vaccines or diagnostics. 

Example 1985 

A DNA sequence (GBSx2095) was identified in S.agalactiae <SEQ ID 6135> which encodes the amino 
5 acid sequence <SEQ ID 6136>. Analysis of this protein sequence reveals the following: 

Possible site: 58 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -8.92 Transmembrane 123 - 139 ( 115 - 145) 
INTEGRAL Likelihood = -6.74 Transmembrane 172 - 188 ( 167 - 190) 
10 INTEGRAL Likelihood = -1.81 Transmembrane 80 - 96 ( 80 - 96) 

Pinal Results 

bacterial membrane Certainty=0 . 4567 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9923> which encodes amino acid sequence <SEQ ID 9924> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database, but there is 
20 homology to SEQ ID 42 1 6. 

A related GBS gene <SEQ ID 8945> and protein <SEQ ID 8946> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
McG: Discrim Score: -6.41 
25 GvH: Signal Score (-7.5): -2.23 

Possible site: 58 
>>> Seems to have no N-terminal signal sequence 
ALOM program count: 3 value: -8.92 threshold: 0.0 

INTEGRAL Likelihood = -8.92 Transmembrane 123 - 139 ( 115 - 145) 
30 INTEGRAL Likelihood = -6.74 Transmembrane 172 - 188 ( 167 - 190) 

INTEGRAL Likelihood = -1.81 Transmembrane 80 - 96 ( 80 - 96) 
PERIPHERAL Likelihood =2.92 46 
modified ALOM score: 2.28 

35 *** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 4567 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1986 

45 A DNA sequence (GBSx2096) was identified in S.agalactiae <SEQ ID 6137> which encodes the amino 
acid sequence <SEQ ID 6138>. This protein is predicted to be histidine rich P type ATPase (HRA-1) 
(copB). Analysis of this protein sequence reveals the following: 

Possible site: 54 

>>> Seems to have no N-terminal signal sequence 
50 INTEGRAL Likelihood =-13.37 Transmembrane 318 - 334 ( 307 - 345) 

INTEGRAL Likelihood = -5.84 Transmembrane 347 - 363 ( 335 - 364) 
INTEGRAL Likelihood = -5.15 Transmembrane 88 - 104 ( 86 - 112) 
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INTEGRAL Likelihood = -5.04 Transmembrane 651 - 667 ( 649 - 669) 

INTEGRAL Likelihood = -4.30 Transmembrane 156 - 172 ( 155 - 173) 

INTEGRAL Likelihood = -4.30 Transmembrane 669 - 685 ( 668 - 690) 

INTEGRAL Likelihood = -3.03 Transmembrane 62 - 78 ( 60 - 80) 



Final Results 

bacterial membrane Certainty=0. 6349 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA62113 GB:U16658 histidine rich P type ATPase [Escherichia 
coli] 

Identities = 598/731 (81%) , Positives = 651/731 (88%) , Gaps = 36/731 (4%) 

Query: 1 MRNNKKHSSHSHHNHGDIDHSKHDHNEMEHSQMDHS 36 

MRNNK+HSSHSHHNHGD++HSKHDHNEMEHSQMDHS 
Sbjct: 1 MRNNKQHSSHSHHNHGDMEHSKHDHNEMEHSQMDHSAMGHCAMGGHAHHHHGDMDHSKHD 60 

20 Query: 37 NMDHSEMDHGAMGGHAHHHHGSFKEIFLKSLPLGIAILLITPMMDIQL 84 

MD+SEMDHGAMGGHAHHHHGSFK+IFLKSLPLGIAILLITP+M IQL 
Sbjct: 61 HNEMKHSQMDHSKMDYSEMDHGAMGGHAHHHHGSFKDIFLKSLPLGIAILLITPLMGIQL 120 

Query: 85 PFQIIFPYADWAAVLATILYIFGGKPFYMGAKDEFNSKAPGMMSLITLGITVSYAYSVY 144 
25 PFQIIFPYADWAAVLATILYIFGGKPF MGAKDEFNSK PGMMSLITLGITVSYAYSVY 

Sbjct: 121 PFQIIFPYADWAAVIATILYIFGGKPFLMGAKDEFNSKVPGMMSLITLGITVSYAYSVY 180 

Query: 145 AVAARYVTGEHVMDFFFEFTTLILIMLLGHWIEMKALGEAGDAQKALAELVPKDAHVVLE 204 
AVAARYVTGE VMDFFFEFTTLILIMLLGHWIEMKALGEAG+AQKALAELVPKDAHvvIiE , 
30 Sbjct: 181 AYAAKYVTGEPVMDFFFEFTTLILIMIiLGHWIEMK^ 240 

Query: 205 DDSIETRPVSELQIGDVIRVQAGENVPADGIIIRGESRVNEALVTGESKPIEKKTGDEVI 264 

DDSIETRPV++LQ+GD+IRVQAGENVPADG I RGESRVNEALVTGESKPIEK GDEVI 
Sbjct: 241 DDSIETRPVADLQVGDLIRVQAGENVPAIOTIQRGESRVNEALVTGESKPIEKNPGDEVI 300 

35 

Query: 265 GGSTNGGGVLYVEIKQTGDQSFISQVQTLISQAQSQPSRAENVAQKVASWLFYIAVWAL 324 

GGSTNG GVLYVE I KQTGD+ S FI SQVQTLI SQAQSQPSRAEN+AQKVA WLFYIAV+ AL 
Sbjct: 301 GGSTNGDGVLYVEIKQTGDKSFISQVQTLISQAQSQPSRAENLAQKVAGWLFYIAVIAAL 360 

40 Query: 325 IALLIWTIIADLPTAVIFTVTALVIACPHALGIAIPLWSRSTSLGASRGLLVKNREALE 384 

IAL+IW +IAD+PTAVIFTVT LVIACPHALGLAIPLV +RSTSLGASRGLLVK+R+ALE 
Sbjct: 361 IALVIW^1VIADVPTAVIFTVTTLVIACPHALGLAIPL'VTARSTSLGASRGLLVKDRDALE 420 

Query: 385 LTTKADVMVLDKTGTLTTGEFKVLDVTVIiSDKYSEEEITGLLAGIEAGSSHPIAQSIVNH 444 
45 LTT ADVMVLDKTGTLTTGEFKVLDV + +DKY+++EI LL+GIE GSSHPIAQSI+++ 

Sbjct: 421 LTTNADVMVLDKTGTLTTGEFKVLDVELFNDKYTKDEIVALLSGIEGGSSHPIAQSIISY 480 

Query: 445 AEAKGIKSVSFDSIEIVSGAGIEGEANGHHYQLISQKAYGKALRMDIPKGATLSILVENN 504 
AE +GI+ VS FDS I + + + SGAG+EG+ANGH YQLISQKAYG+ L MDIPKGAT+S+LVEN+ 
50 Sbjct: 481 AEO^IRPVSFDSIDVMSGAGVEGQANGHRYQLISQKAYGRNLDMDIPKGATISVLVEND 540 

Query: 505 EAIGAVALGDELKETSRNLIEVLKKYGIEPLMATGDNEEAAQGVAEVLGIQYQANQSPED 564 

EAIGAVALGDELK TS++LI+ LKK I+P+MATGDNE+AAQG AE+LGI Y ANQSP+D 
Sbjct: 541 EAIGAVALGDELKPTSKDLIQALKKNKIQPIMATGDNEKAAQGAAEILGIDYLANQSPQD 600 

55 

Query: 565 KYKLVESMKNQNKWIMVGDGVNDAPSLALADVGIAIGAGTQVALDSADIILTQSDPGDI 624 

KY+LVE +K + K VIMVGDGVNDAPSLALADVGIAIGAGTQVALDSADIILTQ PGDI 
Sbjct: 601 KYELVEKLKAEGKKVIIWGDGViroAPSIiAIiADVGIAIGAGTQVALDSADIILTQYSPGDI 660 

60 Query: 625 ESFIEIiANKTTRKMKQNLWGAGYNFIAIPIAAGLLAPIGITLGPAFGAVLMSLSTVIVA 684 

SFIELA KTTRKMK+NLVWGAGYNFIAIPIAAG+LAPIGITL PA AVLMSLSTVIVA 
Sbjct: 661 ASFIELAQKTTRKMKENLVWGAGYNFIAIPIAAGILAPIGITLSPAVAAVLMSLSTVIVA 720 

Query: 685 INAMTLKLEPK 695 
65 INAMTLKLEPK 

Sbjct: 721 INAMTLKLEPK 731 
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There is also homology to SEQ ID 3506. 

A related GBS gene <SEQ ID 8947> and protein <SEQ ID 8948> were also identified. Analysis of this 
protein sequence reveals the. following: 

Lipop: Possible site: -1 Crend: 7 
McG: Discrim Score: -19.12 
GvH: Signal Score (-7.5): -3.71 

Possible site: 27 
>» Seems to have no N-terminal signal sequence 
ALOM program count: 7 value: -13.37 threshold: 0.0 
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modified ALOM score : 3.17 



*** Reasoning Step: 3 



The protein has homology with the following sequences in the databases: 

ORF02015(220 - 2304 of 2604) 

EGAD 1 37454 1 38974(1 - 731 of 731) histidine rich P type ATPase (HRA-1) {Escherichia coli} 
GP|643613|gb|AAA62113.l| |U16658 histidine rich P type ATPase {Escherichia coli} 
PIR| JC2464| JC2464 probable copper-transporting ATPase (EC 3.6.1.-) HRA-1 
Enterobacteriaceae spp. 
%Match =67.4 

%Identity = 85.9 %Similarity =93.7 

Matches = 598 Mismatches = 43 Conservative Sub.s = 54 
162 192 222 252 

PFRENYM*C*MRKF*NFKISL*YNKEELKMRNNKKHSSHSHHNHGDI 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0 

- Certainty=0 

- Certainty=0 



6349 (Affirmative) < suco 
0000 (Not Clear) < suco 
0000 (Not Clear) < suco 




10 20 30 40 50 



294 324 354 384 414 444 474 504 

DHSKHDHNEMEHSQMDHSNMDHSEMDHGAMGGHAHHHHGSFKEIFLKSLPLGIAILLITPMMDIQLPFQIIFPYADV 




70 80 90 100 110 120 130 



534 564 594 624 654 684 714 744 

VAAVLATILYIFGGKPFYMGAKDEFNSKAPGMMSLITLGITVSYAySVyAVAARYOTGEHvMDFFFEFTTLILIMLLGHW 



VAAvLATILYIFGGKPFLMGAKDEFNSKVPGMMSLITLGITVSYAYSVYAVAARYVTGEPvMDFFFEFTTLILIMLLGHW 
150 160 170 180 190 200 210 



774 804 834 864 894 924 954 984 

IEMKALGEAGDAQKALAELVPKDAHVVLEDDSIETRPVSELQIGDVIRVQAGENVPADGIIIRGESRVNEALVTGESKPI 




230 240 250 260 270 280 290 



1014 1044 1074 1104 1134 1164 1194 1224 

EKKTGDEVIGGSTNGGGVLYVEIKQTGDQSFISQVQTLISQAQSQ 
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310 320- 330 340 350 360 370 

1254 1284 1314 1344 1374 1404 1434 1464 

LPTAVI FTVTALVIACPHALGLAI PLWSRSTSLGASRGLLVKNREM,ELTTKADVMVLDKTGTLTTGEFKVLDVTVLSD 

:|||||||ll llllllllllllllll =lllllllllllll|:|=llllll llllllllllllllllllilll :::| 
VPTAVIFTVTTLVIACPHALGIAIPLOTARSTSLGASRGLLVKDRDA^ 

390 400 410 420 430 440 450 

1494 1524 1554 1584 1614 1644 1674 1704 

KYSEEEITGLLAGIEAGSSHPIAQSIVNHAEAKGIKSVSFDSIEIVSGAGIEGEANGHHYQLISQKAYGKALRMDIPKGA 

ll = : = ll Ihlll =11= I I I I I I : = = I I I I : I I = I I I I 1111111111= I Mill 

KyTKDEIVALLSGIEGGSSHPIAQSIISYAEQQGIRPVSFDSIDVMSGAGVEGQANGHRYQLISQKAYGRNLDMDIPKGA 
470 480 490 500 510 520 530 

1734 1764 1794 1824 1854 1884 1914 1944 

TLSILVE^roFAIGAVALGDELKETSRNLIEVLKKYGIEPLmTGDNEEAAQGVAEVLGIQYQANQSPEDKYKLVESMKNQ 

hhlllhlllllllllllll ||::||: III h I = I I I I I I I = I I II Ihlll I I I I I I = I I I = I I I =1 = 
TISVLTONDEAIGAVALGDELKPTSKDLIQALKKNKIQPIMATGDNEKAAQGAAEILGIDYIJWQSPQDKYELVEKLKAE 
550 560 570 580 590 600 610 

1974 2004 2034 2064 2094 2124 2154 2184 

NKWIMVGDGVKTOAPSIA1ADVGIAIGAGTQVALDSADIILTQ 

i iiiiimimmiiiiiiiiiiiiiiiiiiiiimi 1 1 1 1 mill iiiiiihiiniiiiiiiiiii 

gkkvimvgdgvndapslaladvgiaigagtqvaldsadii^ 

630 640 650 660 670 680 690 

2214 2244 2274 2304 2334 2364 2394 2424 

AAGLLAPIGITLGPAFGAVLMSLSTVIVAINAMTLKLEPK*NEAGTKKHWLV* PPSRIGSDQLVCCIRKI IDR* I FDKNR 

IIMIIIMII II lllllllllllllllllllllll 
AAGIIAPIGITLSPAVAAVLMSLSTVIVAINAMTLKLEPK 
710 720 730 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1987 

A DNA sequence (GBSx2097) was identified in S.agalactiae <SEQ ID 6139> which encodes the amino 
acid sequence <SEQ ID 6140>. This protein is predicted to be Cop A. Analysis of this protein sequence 
reveals the following: 

Possible site: 59 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2 197 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty^=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA40599 GB:X57326 ORF-1 [Thiobacillus f errooxidans] 
Identities = 26/65 (40%) , Positives = 40/65 (61%) , Gaps = 2/65 (3%) 

Query: 1 MKQE I LL - - DGVKCAGCANTVQERFSAIEG VES VE VDLATKKWLESQTE IDTETLNAAL 58 

M Q+I L G+ CA CA++V++ I G++S +V LAT +A + Q+ I TE L AA+ 
Sbjct: 1 MSQKIFLRITGMTCAHCAHSVEKALLGIHGIDSAQVSLATNQAEVFLQSSIPTEALLAAV 60 

Query: 59 AETNY 63 
+ Y 

Sbjct: 61 TQAGY 65 



There is also homology to SEQ ID 3510. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 1988 

A DNA sequence (GBSx2098) was identified in S.agalactiae <SEQ ID 6141> which encodes the amino 
acid sequence <SEQ ID 6142>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

>>> Seems to have no N-terminal signal sequence 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1989 

A DNA sequence (GBSx2099) was identified in S.agalactiae <SEQ ID 6143> which encodes the amino 
acid sequence <SEQ ID 6144>. This protein is predicted to be heavy-metal transporting P-type ATPase 
(b0484). Analysis of this protein sequence reveals the following: 

Possible site: 27 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.09 Transmembrane 131 - 147 ( 130 - 150) 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB01764 GB:U42410 heavy-metal transporting P-type ATPase 
[Proteus mirabilis] 
Identities = 98/153 (64%) , Positives = 123/153 (80%) 

Query: 2 KAVKALRRRGWVIMITGDNKRTAKAIAKQVGIDSVLSEVLPEDKAEEVKKLQEAGKKVA 61 

+A+KAL G++V MITGDNK TAKAIAKQ+GID +++EVLP+ K +K+L + G KVA 
Sbjct: 649 EAIKALHALGLKVAMITGDNKATAKAIAKQLGIDEIVAEVLPDGKVAALKQLSQKGDKVA 708 

Query: 62 WGDGINDAPAIAQANVGIAVGSGTDVAIESADIVIjMRNDLTAvLTTIDLSHATLRNIKQ 121 

VGDGINDAPALAQA+VG+A+G+GTDVAIE+AD+VLM DL V+ I LS AT+RNIKQ 
Sbjct: 709 FVGDGINDAPALAQADVGLAIGTGTDVAIEAADWLMSGDLRGWDAIALSQATIRNIKQ 768 

Query: 122 NLFWAFAYNLVGI PVAMGLLYI FGGLLMSPMLA 154 

NLFW FAYN + IPVA G+LY G+L+SP+ A 
Sbjct: 769 NLFWTFAYNALLIPVAAGMLYPINGMLLSPIFA 801 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3505> which encodes the amino acid 
sequence <SEQ ID 3506>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-10.83 Transmembrane 328 - 344 ( 314 - 348) 



vaccines or diagnostics. 



Final Results 

bacterial cytoplasm Certainty=0. 3220 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0. 2635 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 
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INTEGRAL 



Likelihood = 



7.01 



Transmembrane 354 - 370 ( 347 - 377) 
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Final Results 

bacterial membrane Certainty=0 . 5331 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 92/152 (60%) , Positives = 123/152 (80%) 

Query: 4 VKALRRRGVBVIMITGDNKRTAKAIAKQVGIDSVIiSEVLPEDKAEEVKKLQEAGKKVAMV 63 

V+AL + G+ IM+TGD+ TAKAIA QVGI V+S+VLP+ KA + L+ G+KVAMV 
Sbjct: 544 VEALHQLGIHTIMLTGDHDATAKAIASQVGITDVISQVLPDQKAGVIADLRSQGRKVAMV 603 

Query: 64 GDGINDAPALAQANVGIAVGSGTDVAIESADIVLMRNDLTAVLTTIDLSHATLRNIKQNL 123 

GDGINDAPALA A++GIA+GSGTD+AIESAD++LM+ D+ ++ + LS T+R +K+NL 
Sbjct: 604 GDGINDAPALAVADIGIAMGSGTDIAIESADVILMKPDMLDLVKAMSLSRVTMRIVKENL 663 

Query: 124 FWAFAYNLVGI PVAMGLLYI FGGLLMSPMLAG 155 

FWAF YN++ IPVAMGLL++FGG L++PMLAG 
Sbjct: 664 FWAFI YNVLMI P VAMGLLHLFGGPLLNPMLAG 695 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1990 

A DNA sequence (GBSx2100) was identified in S.agalactiae <SEQ ID 6145> which encodes the amino 
acid sequence <SEQ ID 6146>. This protein is predicted to be CopY. Analysis of this protein sequence 
reveals the following: 

Possible site: 23 

»> Seems to have no N-terminal signal sequence 



Final Results 



bacterial cytoplasm Certainty=0. 2067 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG10085 GB:AF296446 CopY [Streptococcus mutans] 
Identities = 63/139 (45%) , Positives = 96/139 (68%) 



Query: 8 



Sbjct: 2 



TSITDAEWEVMRvWANDLVTSKOTISVLKEKMDWTESTIKTILGRLVEKGVLNTEQEGR 67 
TSI++AEWEVMRWWA + +S +I++L W+ STIKT++ RL EKG L ++++GR 

TSISNAEVffiVmVWAKQMTSSSEIIAILSRTYCWSASTIKTLITRLSEKGYLTSQRQGR 61 



Sbjct: 62 



Query: 68 



KFIYTANIVEKFAVRDFAEDIFNRICKKKVGNVIGSIIEDHVLSFDDIDRLEKILEIKKS 127 
K+IY++ I E+EA+ ++F+RIC K +1 ++E+ ++ DI++LE +L KK+ 

KYIYSSLISEEEALEQQVSEVFSRICVTKHQALIRHLVEETPMTLSDIEKLEALLLSKKA 121 



Query: 128 FAVEEVDCQCTEGQCDCHE 146 

AV EV C C GQC C+E 
Sbjct: 122 NAVPEVKCNCIVGQCSCYE 140 



There is also homology to SEQ ID 3502. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1991 

A DNA sequence (GBSx2101) was identified in S.agalactiae <SEQ ID 6147> which encodes die amino 
5 acid sequence <SEQ ID 6148>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

>>> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0. 2829 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

15 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1992 

A DNA sequence (GBSx2102) was identified in S.agalactiae <SEQ ID 6149> which encodes the amino 
20 acid sequence <SEQ ID 6150>. This protein is predicted to be DS RF protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 57 

>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-13.21 Transmembrane 142 - 158 ( 136 - 169) 
25 INTEGRAL Likelihood = -3.45 Transmembrane 70 - 86 ( 66 - 88) 

INTEGRAL Likelihood = -3.13 Transmembrane 178 - 194 ( 176 - 195) 

Final Results 

bacterial membrane Certain ty=0 . 6286 (Affirmative) < suco 

30 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA26611 GB:L10909 putative [Staphylococcus aureus] 
35 Identities = 98/204 (48%) , Positives = 148/204 (72%) , Gaps = 3/204 (1%) 



40 



Query: 4 TIISAIGVYISTSIDYLIVLIILFAQLSQNKQKWHIYAGQYLGTGLLVGASLVAAY-WN 62 

TI++A VY++T IDYL++LI+LF+Q+ + + K HI+ GQY+GT +++GASL+ A WN 
Sbjct: 18 TILTATAVYVATGIDYLVILILLFSQVKKGQVK-HIWIGQYIGTAIVIGASLLVAQGVVN 76 

Query: 63 , FVPEAWMVGLLGLI PI YLGIRFAIVGEGEEEEEEEI IERLEQSKANQLFWTVTLLTIASG 122 

+P+ W+ +GLLGL+ P+YLG+ + I GE E+E+E 1+ K NQLF T+ + +AS 

Sbjct: 77 LIPQQWVIGLLGLLPLYLGVKIWIKGE-EDEDESSILSLFSSGKFNQLFLTMIFIVLASS 135 

45 Query: 123 GDNLGIYIPYFASLDWSQTLWLLVFAIGIIIFCELSWVLSSIPLISETIEKYQRIIVPL 182 

D+ IYIPYF +L S+ +V +VF I + + C +S+ L+S ISETIEKY+R IVP+ 
Sbjct: 136 ADDFSIYIPYFTTLSMSEIFIVTIVFLIMVGVLCYVSYRLASFDFISETIEKYERWIVPI 195 

Query: 183 VFIPLGLYIMYESGTIETFLNFIL 206 
50 VFI LG+YI++E+GT ++F+L 

Sbjct: 196 VFIGLGIYILFENGTSNALISFLL 219 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 6151> which encodes the amino acid 
sequence <SEQ ID 61 52>. Analysis of flu's protein sequence reveals the following: ' 

Possible site: 34 
:>» Seems to have an uncleavable N-term signal seq 
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Final Results 

bacterial membrane Certainty=0. 6265 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) <; suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:AAF42284 GB:AE002544 cadmium resistance protein [Neisseria 
meningitidis MC581 
Identities = 201/208 (96%) , Positives = 205/208 (97%) 

Query: 1 MRCFMIQNWTSIILYSGTAVDLLIILMLFFAKRKSRKDIINIYLGQFLGSVSLILLSLL 60 

MRCFMIQNVVTSIILYSGTAVDLLIILMLFFAKRKSRKDIINIYLGQFnGSVSLILLSLL 
Sbjct: 1 MRCFMIQNWTS I ILYSGTAVDLLI ILMLFFAKRKSRKDI INIYLGQFLGS VSLILLSLL 60 

Query: 61 FAFVLDYIPSKEILGLLGLIPIFI^LKVLLLGDSDGEAIAKEGLSKDNKNLIFLVAMITF 120 

FAFVLDYIPSKEILGLLGLIPI LG+KVLLLGDSDGE30^KEGL KDNKNLI FLVAMITF 
Sbjct: 61 FAFVLDYIPSKEILGLLGLIPIBI^IKVLLLGDSDGFAIAKEGI^KDNKNLIFLVAMITF 120 

Query: 121 ASCGADNIGVFVPYFTTIjNLANLIVAIiLTFLvMIYLLVFSAQKLAQVPSVGETLEKYSRW ISO 

ASa^NIGVFVPYFTTLNLANLIVALLTFLVMIYLLVFSAQKIAQVPSVGETLEKySR^ 
Sbjct: 121 ASCGADNIGVFVPYFTTimANLIVALLTFIiVMIYIiVFSAQKIAQVPSVGETLEKYSRW 180 

Query: 181 FIAWYLGLGMYILIENNSFDMLWAVLG 208 

F+AWYLGLG+YIL+ENNSFDMLW VLG 
Sbjct: 181 FVAVVYLGLGIYILVENNSFDMLWTVLG 208 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 71/200 (35%) , Positives = 130/200 (64%) , Gaps = 4/200 (2%) 

Query: 1 MGQTIISAIGVYISTSIDYLIVLIILFAQLSQNKQKWHIYAGQYLGTGLLVGASLVAAYV 60 

M Q ++4+I +Y T++D LI+L++ FA+ K +IY GQ+LG+ L+ SL+ A+V 
Sbjct: 5 MIQNWTS I ILYSGTAVDLLI ILMLFFAKRKSRKDI INI YLGQFLGSVSLILLSLLFAFV 64 

Query: 61 VNFVPEAWWGLLGLIPIYLGIRFAIVGEGEEEEEEEIIERLEQSKANQLFWTVTLLTIA 120 

++++P ++GLLGLIPI+LG++ ++G+ +E +EL+ N+FV ++T A 
Sbjct: 65 LDYIPSKEILGLLGLIPIFLGLKVLLLGDSDGEAIAK--EGLSKDNKNLIF-LVAMITFA 121 

Query: 121 S-GGDNLGIYIPYFASLDWSQTLWLLVFAIGHIFCELSWVLSSIPLISETIEKYQRII 179 

S G DN+G+++PYF +L+ + +V LL F + I + + L+ 4-P -s- ET+EKY R 
Sbjct: 122 SCGADNIGVWPYFTTIJSnjANLIVALLTFLvMIYIiLWSAQKIjAQVPSVGETLEKYSRWF 181 

Query: 180 VPLVFIPLGLYIMYESGTIE 199 

+ +V++ LG+YI4 E+ + + 
Sbjct: 182 IAWYLGLGMYILIENNSFD 201 

SEQ ID 6150 (GBS174) was expressed in and purified from E.coli. The purified protein is shown in lane 7 
of Figure 223. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1993 

A DNA sequence (GBSx2103) was identified in S.agalactiae <SEQ ID 6153> which encodes the amino 
acid sequence <SEQ ID 6154>. This protein is predicted to be Pgm. Analysis of this protein sequence 
reveals the following: 

5 Possible site: 53 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=Q . 4324 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB96418 GB:AJ243290 phosphoglucomutase [Streptococcus thermophilus] 
15 Identities = 65/76 (85%) , Positives = 71/76 (92%) 

Query: 1 MTYTENLQKWLDFEQLPDYLRQELLSMDEKTKEDAFYTNLEFGTAGMRGYIGAGTNRINI 60 

M+YTEN QKWLDF +LP YLR EL+SMDEKTKEDAFYTNLEFGTAGMRG IGAGTNRINI 
Sbjct: 1 MSYTENYQKWLDFAELPAYLRDELVSMDEKTKEDAFYTNLEFGTAGMRGLIGAGTNRINI 60 



20 



Query: 61 YWRQATEGLAKLIET 76 

YWRQATEGLA+LI++ 
Sbjct: 61 YWRQATEGLAQLIDS 76 

25 A related DNA sequence was identified in S.pyogenes <SEQ ID 6155> which encodes the amino acid 
sequence <SEQ ID 6156>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

>» Seems to have no N-terminal signal sequence 

30 Final Results 

bacterial cytoplasm Certainty=0. 4324 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 An alignment of the GAS and GBS proteins is shown below. 

Identities = 75/76 (98%) , Positives = 75/76 (98%) 

Query: 1 MTYTENLQKWLDFEQLPDYLRQELLSMDEKTKEDAFYTNLEFGTAGMRGYIGAGTNRINI 60 
MTYTEN QKWLDFEQLPDYLRQELLSMDEKTKEDAFYTNLEFGTAGMRGYIGAGTNRINI 
40 Sbjct: 1 MTYTENFQKWLDFEQLPDYLRQELLSMDEKTKEDAFYTNLEFGTAGMRGYIGAGTNRINI 60 

Query: 61 YWRQATEGLAKLIET 76 

YWRQATEGLAKLIET 
Sbjct: 61 YWRQATEGLAKLIET 76 

45 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1994 

A DNA sequence (GBSx2104) was identified in S.agalactiae <SEQ ID 6157> which encodes the amino 
50 acid sequence <SEQ ID 6158>. This protein is predicted to be a membrane protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 53 

>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -6.21 Transmembrane 94 - 110 ( 93 - 115) 
55 INTEGRAL Likelihood = -4.14 Transmembrane 172 - 188 ( 166 - 188) 
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INTEGRAL Likelihood = -1.97 Transmembrane 130 - 146 ( 129 - 149) 
INTEGRAL Likelihood = -0.16 Transmembrane 62 - 78 ( 62 - 79) 



Final Results 

bacterial membrane Certainty=0. 3 4 84 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA80247 GB:Z22520 membrane protein [Bacillus acidopullulyticus] 
Identities = 47/185 (25%) , Positives = 80/185 (42%) , Gaps = 23/185 (12%) 

Query: 1 MKKKNKSSNIAIIAIFFAIMLVIHFLSSFIFSFWLVPIKPTLMHIPVIIASIAYGPRIGA 60 

MKK +11 + A+ +++ T+MHIP II I GP +G 

Sbjct: 1 MKKSLTVRDIVIAGVLGAVAILLGVTRLGYIPVPTAAGNATIMHIPAIIGGIMQGPWGL 60 

Query: 61 TLGALMGGI S VANSS I VLLPTSYLFSPFVENGNFYSLI IALVPRILIGI I PYFVYKLLHN 120 

+GA+ G S N+++ L F +++++PR+ IG++ + VY + 

Sbjct: 61 IVGAIFGISSFLNATVPL FKDPLVSILPRLFIGWAWLVYIGIRR 105 

Query: 121 R FGLAISGAIGSLTNTVFVLSGIFIFFSSTYNGNIKLMLAGIISSNSLAEMVIAAII 177 

+ + +S IG+LTNT VL+ F + +A +N L E V+ 1+ 
Sbjct: 106 KSEYVAVGLSAFIGTLTNTALVLA- -MAVFRHYLTAGVAWTVA ITNGLPEAWGTIV 160 

Query: 178 VYLTV 182 
V 

Sbjct: 161 TLAW 165 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6159> which encodes the amino 
sequence <SEQ ID 6160>. Analysis of this protein sequence reveals the following: 

Possible site: 31 
>>> Seems to have an uncleavable N-term signal seq 
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Final Results 

bacterial membrane Certainty=0. 4588 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAA80247 GB:Z22520 membrane protein [Bacillus acidopullulyticus] 
Identities = 47/193 (24%) , Positives = 86/193 (44%) , Gaps = 28/193 (14%) 

Query: 8 RKSADISRIAIFFAIMLVIHFVSSLVFNIWPIPI KPTLVHIPVIIASVLYGPRIGAI 64 

+KS + I I + V + P+P T++HIP II ++ GP +G I 

Sbjct: 2 KKSLTVRDIVIAGVLGAVAILLGVTRLGYIPVPTAAGNATIHHIPAI IGGIMQGPWGLI 61 

Query: 65 LGGLMGIISVITNTIILLPTNYLFSPFVDHGTFASLIIAIIPRILIGITPYYCYKLIPNQ 124 

+G + GI S + T+ L F +++I+PR+ IG+ + Y I + 

Sbjct: 62 VGAIFGISSFLNATVPL FKDPLVSILPRLFIGWAWLVYIGIRRK 106 

Query: 125 FGLIVSGI IGSLTNTIFVLS-GIFIFFATVFDGNIKALLTAIISSNAIVEMIISAII 180 

+ G+ IG+LTNT VL+ +F + T + + +N + E ++ 1+ 

Sbjct: 107 SEYVAVGLSAFIGTLTNTALVLAMAVFRHYLTA GVAWTVAITNGLPEAWGTIV 160 

Query: 181 TFVLI PTLSRLKR 193 

T ++ ++ R 
Sbjct: 161 TLAWLAWKQIGR 173 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 121/184 (65%) , Positives = 157/184 (84%) 

Query: 6 KSSNIAIIAIFFAIMLVIHFLSSFIFSFWLVPIKPTLMHIPVIIASIAYGPRIGATLGAL 65 
5 KS++I+ IAIFFAIMLVIHF+SS +F+ W +PIKPTL+HIPVIIAS+ YGPRIGA LG L 

Sbjct: 9 KSADISRIAIFFAIMLVIHFVSSLVFNIWPIPIKPTLVHIPVIIASVLYGPRIGAILGGL 68 

Query: 66 MGGISVANSSIVLLPTSYLFSPFVENGNFYSLIIALVPRILIGIIPYFVYKLIjHNRFGIiA 125 
MG ISV ++I+LLPT+YLFSPFV++G F SLI1A++PRILIGI PY+ YKL+ N+FGL 
10 Sbjct: 69 MGIISVITNTIILLPTNYLFSPFVDHGTFASLIIAIIPRILIGITPYYCYKLIPNQFGLI 128 

Query: 126 ISGAIGSLTNTVFVLSGIFIFFSSTYNGNIKLMLAGIISSNSLAEMVIAAI IVYLTVPRI 185 

+SG IGSLTNT+FVLSGIFIFF++ ++GNIK +L IISSN++ EM+I+AII ++ +P + 
Sbjct: 129 VSGIIGSLTNTIFVLSGIFIFFATVFDGNIKALLTAI1SSNAIVEMIISAIITFVLIPTL 188 

15 

Query: 186 LNIK 189 
+K 

Sbjct: 189 SRLK 192 

20 A related GBS gene <SEQ ID 8949> and protein <SEQ ID 8950> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 5 
McG: Discrim Score: 13.42 
GvH: Signal Score (-7.5): -1.93 
25 Possible site: 53 

>>> Seems to have a cleavable N-term signal seq. 
ALOM program count: 2 value: -6.21 threshold: 0.0 

INTEGRAL Likelihood = -6.21 Transmembrane 94 - 110 ( 93 - 115) 
INTEGRAL Likelihood = -0.16 Transmembrane 62 - 78 ( 62 - 79) 
30 PERIPHERAL Likelihood = 1.70 123 

modified ALOM score: 1.74 

*** Reasoning Step: 3 

35 Final Results 

bacterial membrane Certainty=0. 3484 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the databases: 

ORF0156K301 - 723 of 1017) 

EGAD | 38021 | 39600 (1 - 129 of 183) hypothetical membrane protein {Bacillus acidopullulyticus} 
GP|806536|emb|CAA80247.l| |Z22520 membrane protein {Bacillus acidopullulyticus} 
%Match = 7.6 
45 %Identity =29.7 %Similarity =53.9 

Matches = 38 Mismatches = 57 Conservative Sub.s = 31 



50 



162 192 222 252 282 312 342 372 

KKIGYQEIEPRISLIACGDTGQGAIADISTILKCIQEVAN*AVNLYTISSLI*GVIMKKKNKSSNIAIIAIFFAIMLVIH 

III =1 I == |: = = = 

MKKSLTVRDI VIAGVLGAVAILLG 

10 20 



402 432 462 492 522 552 582 612 

55 FLSSFIFSFWLVPIKPTLMHIPVIIASIAYGPRIGATLGALMGGISVANSSIVLLPTSYLFSPFVENGNFYSLIIALVPR 

Mill II I II =1 =11= I I I::: I I = = = = = 11 

VTRLGYI PVPTAAGNATIMHI PAI IGGIMQGP WGLIVGAIFGISSFLNATVPL FKDPLVSILPR 

40 50 60 70 80 

60 642 663 693 723 753 783 813 843 

ILIGIIPYFVY KLLHNRFGIAISGAIGSLTNIVFvXSGIFIFFSSTYNGNIKLMLAGIISXNSLAEMVIAAIIVYLT 

::||:: ::|| = = =1 11=1111 =1 = 

LFIGVVAWLWIGIRRKSEYVAVGLSAFIGTLTNTALVIAMAVFRHYLTAGVAWTVAI 

100 110 120 130 140 150 160 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1995 

5 A DNA sequence (GBSx2105) was identified in S.agalactiae <SEQ ID 6161> which encodes the amino 
acid sequence <SEQ ID 6162>. Analysis of this protein sequence reveals the following: 

Possible site: 18 

>>> Seems to have no N- terminal signal sequence (or aa 1-18) 

10 Final Results 

bacterial cytoplasm Certainty=0 . 0165 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC44502 GB:U48885 DNA/pantothenate metabolism flavoprotein 
[Streptococcus mutans] 
Identities = 101/145 (69%) , Positives = 122/145 (83%) 

20 Query: 1 MIKRITLAVTGSISAYKAADLTSQLTKIGYDVHIIMTQAATEFITPLTLQVLSKNPIHLD 60 

M K+I LAV+GSI+AYKAADL+ QLTK+GY V++ MT AA +FI PLTLQVLSKNP++ + 
Sbjct: 1 MTKKILIAVSGSIAAYKAADLSHQLTKLGYHVNVFMTNAAKQFIPPLTLQVLSKNPVYSN 60 

Query: 61 VMDEHNPKIINHIELAKRTDLFIVAPASANTIAHIAYGFADNIVTSVALAMPDETPKLIA 120 
25 VM E +P++INHI LAK+ DLF++ PASAWT+AHIA+GFAOTIVTSVALA+P E PK A 

Sbjct: 61 VMKEDDPQVINHIAIAKQADLFLLPPASANTLRHIAHGFADNIVTSVALALPLEVPKFFA 120 

Query: 121 PAMNTKMYHNTITQRNID1LKKIGY 145 
PAMNTKMY N ITQ NI +LKK GY 
30 Sbjct: 121 PAMNTKMYENPITQSNITLLKKFGY 145 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6163> which encodes the amino acid 

sequence <SEQ ID 6164>. Analysis of this protein sequence reveals the following: 

Possible site: 18 
35 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0076 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 146/178 (82%) , Positives = 155/178 (87%) 

45 Query: 1 MIKRITIAVTGSISAYKAADLTSQLTKIGYDvHIIMTQAATEFITPLTLQVLSKNPIHLD 60 

M K I TLAV+GS I SAYKAADLTSQLTKIGYD VHI IMTQAAT+F I TPLTLQVLSKN IHLD 
Sbjct: 1 MTKHITLAVSGSISAYKAADLTSQLTKIGYDVHIIMTQAATQFITPLTLQVLSKNAIHLD 60 

Query: 61 VMDEHNPKIINHIELAKRTDLFIVAPASANTIAHIAYGFADNITCSVALAMPDETPKIjIA 120 
50 vMDEH+PK+INHIEIiAKRTDLFIVAPASANTIAHLAYGFADN+VTSVALA+P TPKLIA 

Sbjct: 61 VMDEHDPKVINHIELAKRTDLFIVAPASANTIAHIAYGFADNLVTSVALALPATTPKLIA 120 

Query: 121 PAMNTKMYHNTITQRNIDILKKIGYQEIEPRISBIACGDTGQGALADISTILKCIQEV 178 
PAMNTKMY N ITQ NI L IG+ EI P+ SLLACGD G GALADI IL I + 
55 Sbjct: 121 PAMOTKMYQNPITQENIKRLSTIGFTEIPPKSSIiLACGDKGPGALADIDVILATIDTI 178 



SEQ ID 6162 (GBS236) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 52 (lane 5; MW 21.6kDa). 
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Purified GBS236-GST is shown in Figure 208 (lane 6) and in Figure 225 (lanes 4-5). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1996 

A DNA sequence (GBSx2106) was identified in S.agalactiae <SEQ ID 6165> which encodes the amino 
acid sequence <SEQ ID 6166>. This protein is predicted to be pantothenate metabolism flavoprotein 
homolog (dfp). Analysis of this protein sequence reveals the following: 

Possible site: 13 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 2325 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9835> which encodes amino acid sequence <SEQ ID 9836> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG39941 GB:AF301375 MTW1216 [Methanothermobacter wolfeii 
20 prophage psiMlOO] 

Identities = 71/229 (31%), Positives = 117/229 (51%), Gaps = 27/229 (11%) 

Query: 6 MKILITSGGTTEKIOTVRSITNHATGTIiGKIIAEKYLREGHQ 65 
+++L++ GGT E ID TO ITN ++G +G +A + +G VTLV V + + L 

25 Sbjct: 172 LRVLVSLGGTLEPIDPTOVim^SGRMGIAVAREAYIO^ADVTLVA--GTVSVDIPSQL 229 

Query: 66 STFEIED VDSL I KTLKPLVKEHDI L I HSMAVSDYTPVYMADFEKVKS SDHLDTFLRKDNH 125 

T E + + + L+ EHD+ + + AVSD+ PVY 
Sbjct: 230 RTVRAETAHEMAEAVAELIGEHDVFVSAAAVSDFRPVYS 268 

30 

Query: 126 EGKISSESEYQVLFLKKTPKVISLVKKmPQITLVGFKLLVNVTKENLFKVARHSLIKNK 185 

E KISS+SE L LK PK+I + ++ NP+ +VGFK V++E L AR + + 
Sbjct: 269 EEKISSDSEI-TLRLKPNPKIIRMARETNPEAFIVGFKAEHGVSEEEIiIAAARKQIEDSV 327 

35 Query: 186 ATFILANDL-IDITSKHHIAYLLDHDNVYKATT--KEDIAQLIYEKVKK 231 

A ++AND+ ++ + ++ + V + T KE++A LI ++ K 

Sb j Ct : 328 ADMWANDVSVEGFGSENNRAI I VSEGVTELPTMKKEELAGLI IGEIMK 376 

A related DNA sequence was identified in S. pyogenes <SEQ ID 6167> which encodes the amino acid 
40 sequence <SEQ ID 6168>. Analysis of this protein sequence reveals the following: 

Possible site: 54 

>>> Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=0. 1737 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

50 Identities = 142/230 (61%) , Positives = 170/230 (73%) 

Query: 4 MAMKILITSGGTTEKIDTVRSITNHATGTLGKIIAEKYLREGHQVTLOTTKIJAVKPESAT 63 

M MK++ITSGGTTE ID VR ITNH+TG LGK+I E++L+ H VTLVTTK A KP 
Sbjct: 1 MTMKLIITSGGTTEPIDAVRGITNHSTGQLGKLITERFLQYHHDVTLVTTKTATKPLPNK 60 

55 
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Query: 64 NLSTFEIEDVDSLIKTLKPLVKEHDILIHSMAVSDYTPVYMADFEICVKSSDHLDTFLRKD 123 

L E+E V+ L+ LK V HD I LIHSMAVSDYTPVYM D E+V +D+L+ FL + 
Sbjct: 61 RLRIIEVETVNDLMAALKDQVPHHDILIHSMAVSDYTPVYMTDLEQVSQADNLNCFLCEH 120 

5 Query: 124 NHEGKISSESEYQVLFLKKTPKVISLVKKVMPQITLV6FKLLVNVTKE1NLFKVARHSLIK 183 

N E KISS S+YQVLFLKKTPKVIS VK+WNP I LVGFKLLVNV +E L KVAR SL K 
Sbjct: 121 NSEPKISSASDYQVLFLKKTPKVISYVKQWNPNIKLVGFKLLV1WPQEELIKVARASLAK 180 

Query: 184 NKATFILA1TOLIDITSKHHIAYLLDHDIWYKATTKEDIAQLIYEKVKKYD 233 
10 N A +ILANDL+DI + H A L+ ++ V A TKE IA L+YE++ K+D 

Sbjct: 181 NHADYIIAM)LVDIQTGMHKALLISNNEVASADTKEAIADLLYERMTKHD 230 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

15 Example 1997 

A DNA sequence (GBSx2107) was identified in S.agalactiae <SEQ ID 6169> which encodes the amino 
acid sequence <SEQ ID 6170>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

>>> Seems to have no N-terminal signal sequence 
20 INTEGRAL Likelihood = -0.22 Transmembrane 117 - 133 ( 117 - 133) 

Final Results 

bacterial membrane Certainty=0 . 1086 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9833> which encodes amino acid sequence <SEQ ID 9834> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

30 >GP:BAB07541 GB:AP001520 unknown conserved protein in B. subtilis 

[Bacillus halodurans] 
Identities = 94/221 (42%), Positives = 133/221 (59%), Gaps = 2/221 (0%) 

Query: 52 AEKPFIWTEVFLREINRSNQEI ILHIWPMTKTVILGMLDRELPHLELAKKEI ISRGYEPV 111 
35 A+F + + I+S LW TV+LG+ D LP ++ + + ++ + 

Sbjct: 27 ALQSFAYDDTLCTSIGKSQSPPTLRAWVHHNTVVLGIQDSRLPQIKAGIEALKGFQHDVI 86 

Query: 112 VRNFGGLAWADEGILNFSLVIPDVFERKLSISDGYLIMVDFIRSIFSDFYQPIEHFEVE 171 
VRN GGLAW D GILN SLV+ + E+ SI DGY +M + I S+F D + IE E+ 
40 Sbjct: 87 VRNSGGLAWLDSGILNLSLVLKE- -EKGFSIDDGYELMYELICSMFQDHREQIEAREIV 144 



45 



Query: 172 TSYCPGKFDLSINGKKFAGLAQRRIKNGIAVSIYLSVCGDQKGRSQMISDFYKIGLGDTG 231 

SYCPG +DLSI+GKKFAG++QRRI+ G+AV IYL V G R++MI FY + 
Sbjct: 145 GSYCPGSYDLSIDGKKFAGISQRRIRGGVAVQIYLCVSGSGAERAKMIRTFYDKAVAGQP 204 

Query: 232 SPIAYPNVDPEIMANLSDLLDCPMTVEDVIDRMLISLKQVG 272 

+ YP + PE MA+LS+LL P V DV+ + L++L+Q G 
Sbjct: 205 TKFVYPRIKPETMASLSELLGQPHNVSDVLLKAIiMTLQQHG 245 

50 A related DNA sequence was identified in S.pyogenes <SEQ ID 6171> which encodes the amino acid 
sequence <SEQ ID 6172>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

>>> Seems to have an uncleavable N-term signal seq 
55 INTEGRAL Likelihood = -0.22 Transmembrane 95 - 111 ( 95 - m) 



Final Results 

bacterial membrane Certainty=0. 1086 (Affirmative) < suco 
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bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

5 >GP:BAB07541 GB:AP001520 unknown conserved protein in B. subtilis 

[Bacillus halodurans] 
Identities = 97/228 (42%) , Positives = 138/228 (59%) , Gaps = 2/228 (0%) 

Query: 30 ALSPFVWTEVFLKTINQEPNQL1LHIWPMTRTVILGMLDRQLPYFELAKTEIGNNGYVPV 89 
10 ALF + ++I++LW TV+LG+ D +LP + + + + 

Sbjct: 27 ALQSFAYDDTLCTSIGKSQSPPTLRAWVHHNTVVLGIQDSRLPQIKAGIEALKGFQHDVI 86 

Query: 90 TRNIGGLAWADDGILNFSLVIPDHFSESISISNAYLIMVDVIRESFSDYYQRIEYHEIK 149 
RN GGLAW D GILN SLV+ + + SI + Y +M ++I F D+ ++IE EI 
15 Sbjct: 87 VRNSGGLAWLDSGILNLSLVLKEE--KGFSIDDGYELMYELICSMFQDHREQIEAREIV 144 

Query: 150 NSYCPGNFDLSIAGRKFAGIAQRRIKKGIWSIYLSVCGDQAARGQLIKDFYEAGTQGEV 209 

S YCPG+ +DLS I G+KFAGI+QRRI+ G+ V IYL V G A R ++I+ FY+ G+ 
Sbjct: 145 GSYCPGSYDLSIDGKKFAGISQRRIRGGVAVQIYLCVSGSGAERAKMIRTFYDKAVAGQP 204 



20 



Query: 210 TKVNYPQIDPECMATLSELLETPFTVAEVLERLRLTLRQLGFSLTEKS 257 

TK YP+I PE MA+LSELL P V++VL + +TL+Q G SL +S 
Sbjct: 205 TKFVYPRIKPETMASLSELLGQPHNVSDVLLKALMTLQQHGASLLTES 252 

25 An alignment of the GAS and GBS proteins is shown below. 

Identities = 155/275 (56%) , Positives = 199/275 (72%) , Gaps = 8/275 (2%) 

Query: 32 QDLAQLPVSIFKDYVTDAQDAEKPFIWTEVFLREINRSNQEIILHIWPMTKTVILGMLDR 91 
+DLA LP+ ++ D A PF+WTEVFL+ IN+ ++ILHIWPMT+TVILGMLDR 

30 Sbjct: 10 RDLASLPIFVYGDGNKKVPGALSPFVWTEVFLKTINQEPNQLILHIWPMTRTVILGMLDR 69 

Query: 92 ELPHLEIAKKEIISRGYEPVVRNFGGIAWADEGIIJSFSIjVIPDVFERKLSISDGYLIMV 151 

+LP+ ELAK EI + GY PV RN GGIAWAD+GILNFSLVIPD F +SIS+ YLIMV 
Sbjct: 70 QLPYFELAKTEIGNNGYVPVTRNIGGLAWADDGILNFSLVIPDHFSESISISNAYLIMV 129 

35 

Query: 152 DFIRSIFSDFYQPIEHFEVETSYCPGKFDLSINGKKFAGLAQRRIKNGIAVSIYLSVCGD 211 

D IR FSD+YQ IE+ E++ SYCPG FDLSI G+KFAG+AQRRIK GI VSIYLSVCGD 
Sbjct: 130 DVIRESFSDYYQRIEYHEIKNSYCPGNFDLSIAGRKFAGIAQRRIKKGIWSIYLSVCGD 189 

40 Query: 212 QKGRSQMISDFYKIGLGDTGSPIAYPNVDPEIMANLSDLLDCPMTVEDVIDRMLISLKQV 271 

Q R Q+I DFY+ G + + YP +DPE MA LS+LL+ P TV +V++R+ ++L+Q+ 

Sbjct: 190 QAARGQLIKDFYFAGTQGEVTKVNYPQIDPECMATLSELLETPFTVAEVLERLRLTLRQL 249 

Query: 272 GFN DRLLMIRPDLVAEFNRFQAKSMANKG 300 

45 GF+ D+ L+ DV + RQ + + +G 

Sbjct: 250 GFSLTEKSPDQALLTNFDAV- -YERMQLEWRKEG 282 

A related GBS gene <SEQ ID 895 1> and protein <SEQ ID 8952> were also identified. Analysis of this 
protein sequence reveals the following: 

50 Lipop: Possible site: -1 Crend: 10 

McG: Discrim Score: -16.85 
GvH: Signal Score (-7.5): -5.07 

Possible site: 49 
»> Seems to have no N-terminal signal sequence 
55 ALOM program count: 1 value: -0.22 threshold: 0.0 

INTEGRAL Likelihood = -0.22 Transmembrane 117 - 133 ( 117 - 133) 
PERIPHERAL Likelihood =0.47 73 
modified ALOM score: 0.54 

60 *** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 1086 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF01564(451 - 1116 of 1518) 
5 EGAD | 13388 |BS3758 (27 - 249 of 281) hypothetical 31.4 kd protein in pta 3'region {Bacillus 

subtilis} OMNI |NT01BS4391 hypothetical protein SP| P39648 | YWFL_BACSU HYPOTHETICAL 31.4 KDA 
PROTEIN IN PTA 3'REGION. GP | 414014 1 emb| CAA51646 . 1 1 |X73124 ipa-90d {Bacillus subtilis} 
GP|2636300|emb|CAB15791.l| |Z99123 alternate gene name: ipa-90d {Bacillus subtilis} 
PIR|S39745|S39745 ywfL protein - Bacillus subtilis 
10 %Match =15.8 

%Identity =40.8 %Similarity = 61.0 

Matches = 91 Mismatches = 82 Conservative Sub.s = 45 



321 351 381 411 441 471 501 531 

15 *WNLRETYWKISSDCDKINLAEFSRERMSDLLEWQDIAQLPVSIFKDYVTDAQDAEKPFIWTEVFLREINRSNQEIILHI 

Ih: I : = = = 
MANQPIDLLMQPKWRVIDQSSLGPLFDAKQSFAMDDTLCMSVGKGVSPATARS 
10 20 30 40 50 

20 561 591 621 651 681 711 738 768 

WPMTKTVIM3MLDRELPHLEIAKKEIISRGYEPWRNFGGLAWADEGILNFSLVIPDVFERK-LSISDGYL1MVDFIRS 

I |::||: I II h =111 =111 Mill hHI Ih I hi = I II lh-1 

WVHHDTI VLGIQDTRLPFLQDGISLLESEGYRVIVRNSGGIAWLDDGVI1NISLIFED--EKKGIDIDKGYEAMVELMRR 
70 80 90 100 110 120 130 

25 

798 828 858 888 918 972 996 

IFSDFYQPIEHFEVETSYCPGKFDLSINGKKFAGLAQRRIKNGIAVSIYLSVCGDQKG--RSQMISDFYKIGLGD--TGS 

:= : II =hl Mill : I I I I II I I II h = I I h : hll III I h I h = I Ih I I 
MLRPYNAKIEAYEIEGSYCPGSYDLSINGKKFAGISQRRVRGGVAVQIYL--CADKSGSERADLIRRFYQARLKDKQNDK 
30 150 160 170 180 190 200 

1026 1056 1086 1116 1146 1176 1206 1236 

PIAYPNVDPEIMANLSDLLDCPMTVEDVIDRMLISLKQVGFNDRLLMIRPDLVAEFNRFQAKSMANKGMVSRDE*CPR*F 

II = II Ihlhll ::|:|:: :| II = 
35 KGVYPEIRPETMASLSELLQKDISVQDLMFALLTELKALSTHLYSAGLSIDEEMEFEKNLVRMAERNAKVFG 
220 230 240 250 260 270 280 

SEQ ID 8952 (GBS390) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 73 (lane 7; MW 37kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 82 (lane 3; MW 62kDa). 

40 GBS390-GST was purified as shown in Figure 216, lane 12. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1998 

A DNA sequence (GBSx2108) was identified in S.agalactiae <SEQ ID 6173> which encodes the amino 
45 acid sequence <SEQ ID 6174>. This protein is predicted to be probable trimethylamine dehydrogenase 
(nemA). Analysis of this protein sequence reveals the following: 

Possible site: 36 

>>> Seems to have no N- terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty=0 .2218 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

55 The protein has homology with the following sequences in the GENPEPT database. 



>GP:CAA83700 GB:Z33015 similar to trimethylamine DH [Mycoplasma 



WO 02/34771 



PCT/GB01/04789 



-2245- 

capricolum] 

Identities = 162/311 (52%) , Positives = 219/311 (70%) , Gaps = 1/311 (0%) 

Query: 3 OTQGNLFRPLTLPNGLSLENRBVLSPMVTNSSTSEGFVTDDDIAYAVRRAKSAPLQITGA 62 

N LF P L NG LENRFVLSPM + +T +G +TD +■ Y RR+ SAPLQITG 
Sbjct: 2 NKYEKLFEPFYL-NGFKLENRFVLSPMTLSIiATLDGKITDKEADYVKRRSHSAPLQITGG 60 

Query: 63 AYITEYGQLFEYGFSVSKDEDIPGLTKLAKAMKSKGAKAVLQLTHAGRFSSHTLARHGYV 122 

Y E+GQLFEYG S D+DIP LT+L + MK+ +LQL HAG+FS +L ++GY+ 

Sbjct: 61 WFDEFGQLFEYGISAKSDDDIPSLTRLYQEMKTDSNCVILQLAHAGKFSKTSLKKYGYL 120 



Query: 123 YGPSPMQLQSPYPHQVKELTHKDILRIIDEYVQATRRAIQAGFDGVEISSAQRLLIQTFF 182 

YGPS + +P H+V EL + I +11 +Y AT R I+AGF+G+EIS AQRLLIQTFF 
Sbjct: 121 YGPSYEKNHTPIEHEVLELPKEKIKQIIQDYKDATLRVIKAGFNGIEISMAQRLLIQTFF 180 

Query: 183 STFSNQRKDEYGPQTLTNRCRLGLEVFKAVQKVIREEAESDFILGFRATPEETRGSQIGY 242 

S N+R DEY NR R LEV KA+++VI + A +FI GFRATPEET G +GY 

Sbjct: 181 SQIINKRTDEYSATNFENRSRFCLEWKAIREVIDKYAPKNFIFGFRATPEETYGDILGY 240 

Query: 243 SIEEFMEFLEKILAIAQVDYLAIASWGHDVFRNTIRSEGVYKGQLVNQVIFEHFGDRVPI 302 

+ IE+F++ ++KI+ I ++ YLAI ASWGHD+ + M +RS YKGQLVN+ VI + + + +++PI 
Sbjct: 241 TIEDFIQLVDKIIEIGKISYLAIASWGHDIYLNKVRSNTKYKGQLVNKVIYDIYKNKLPI 300 

Query: 303 MATGGINSASK 313 

+++GGIN+ +K 
Sbjct: 301 ISSGGINTPTK 311 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6175> which encodes the amino acid 
sequence <SEQ ID 6176>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 3055 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 265/390 (67%) , Positives = 321/390 (81%) 



Query: 8 LFRPLTLPNGLSLENRFVLSPMVTNSSTSEGFVTDDDIAYAVRRAKSAPLQITGAAYITE 67 

LF PLTLPNG L+NRFVLSPMVTNSST +G+VT DD++YA+RRA SAPLQITGAAY+ 
Sbjct: 8 LFEPLTLPNGSQLDNRFVLSPMVTNSSTKDGYVTQDDVSYALRRAASAPLQITGAAYVDP 67 

Query: 68 YGQLFEYGFS VSKDED I PGLTKLAKAMKSKGAKAVLQLTHAGRFS SHTIARHGYVYGPS P 127 

YGQLFEYGFSV+KD DI GL +LA+AMK+KGAKAVLQLTHAGRF+SH L ++G+VYGPS 
Sbjct: 68 YGQLFEYGFSVTKDADISGLKELAQAMKAKGAKAVLQLTHAGRFASHALTKYGFVYGPSY 127 

Query: 128 MQLQSPYPHQVKELTHKDILRIIDEYVQATRRAIQAGFDGVEISSAQRLLIQTFFSTFSN 187 

MQL+SP PH+VK LT + I +1 Y QATRRAIQAGFDGVE+SSAQRLLIQTFFSTFSN 
Sbjct: 128 MQLRSPQPHEVKPLTGQQIEELIAAYAQATRRAIQAGFDGVEVSSAQRLLIQTFFSTFSN 187 

Query: 188 QRKDEYGPQTLTNRCRLGLEVFKAVQKVIREEAESDFILGFRATPEETRGSQIGYSIEEF 247 

+R D YG QTL NR +L L V +AVQ+VI++EA FI GFRATPEETRG+ IGYSI+EF 
Sbjct: 188 KRTDSYGCQTLFNRSKLTLAVLQAVQQVIKQEAPDGFIFGFRATPEETRGNDIGYSIDEF 247 



Query: 248 MEFLEKILAIAQVDYLAIASWGHDVFRNTIRSEGVYKGQLVNQVIFEHFGDRVPIMATGG 307 

++ ++ +L +A++DYLAIASWG VFRNT+RS G Y G+ VNQV+ ++ +++P+MATGG 
Sbjct: 248 LQL^VTO^AIOjDYIAIASWGRHVFRNTWSPGPYYGRRWQVVRDYLRNKIjPVMATGG 307 

Query: 308 INSASKVFEALQHAHMIGASTPLWDPEFLQKIKAKCSDQINLRIKVSDLEGLAIPKASF 367 

+N+ K EAL HA IG STP WDPEF KIK C + I+LRI+ +DL+ LAIP+ASF 
Sbjct: 308 MNTPDKAIFALAHADFIGVSTPFVVDPEFAHKIKEGCEESIHLRIRPADLKSLAIPQASF 367 



Query: 



368 KDIVPLMDYGESLPKEAREVFRELRSNYRE 397 
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KDIVPLMDYGESLPKE+R +FR L NY+E 
Sbjct: 368 KDIVPLMDYGESLPKESRTLFRSLTHNYKE 397 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
5 vaccines or diagnostics. 

Example 1999 

A DNA sequence (GBSx2109) was identified in S.agalactiae <SEQ ID 6177> which encodes the amino 
acid sequence <SEQ ID 6178>. Analysis of this protein sequence reveals the following: 

Possible site: 53 
10 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3748 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04594 GB:AP001510 unknown conserved protein [Bacillus halodurans] 
Identities = 121/333 (36%) , Positives = 192/333 (57%) , Gaps = 12/333 (3%) 

20 



Query: 


1 


MKLSVLDYGL I DYGKTASDAI QET I LLSQFAERLGYHQFWVAEHHGVKAFS I SNPELMIM 


60 






MKLSVLD I YG A +A+++T L++ E LGYH+ FWV+EHH + S+PE++I 




Sb j ct : 


1 


MKLSVLDQSPIAYGSNAKEALRQTTEIAKVTEALGYHRFWVSEHHDASTIiAGSSPEVLIA 


60 


Query: 


61 


HIANQTKSIKIGSGGIMPLHYSSFKLAETLKTLETCHPNRVSIGLGNSLGTVKVSNALRS 


120 






HLA TK I++GSGG+M HYS++K+AE K LE HP R+ +GLG + G + ++ 




Sb j ct : 


61 


H1AAHTKKIRLGSGGWLPHYSAYKVAENFKLLEALHPGRIDVGLGRAPGGMPIAKMALQ 


120 


Query: 


121 


LHK- - -AHDYEEVLEELKSWLIDESSSKEPL VQPTLS S FPDLYVLGSGQKSAYLAA 


173 






K H Y ++++ +L D+ + P + + PD+++LGS SA +AA 




Sb j ct : 


121 


EGKEQNIHKYPLQVKDVIGYLQDDLPTDHRFHGLKATPLIDTVPDVWLLGSSGGSANVAA 


180 


Query: 


174 


KLGLGFTFGVFPFMDKDPLTEAKKLSSLYyHQFEEYYPNKSPNL^^VAAFWIADTSEEAE 


233 






+ G GF F F++ + +A + Y F+ P VA FV+ ADT E+A+ 




Sb j ct : 


181 


ENGTGFAFA- -HFINGEGGVQAVE- - - SYRETFQPSALFDRPQTSVAI FVI CADTDEQAD 


235 


Query: 


234 


NIAKTLDIWMLGNKDFNEFATFPTIEEANHYQLTPEQKAKIKSNRDRMIVGDPKQVKESL 


293 






IA +LD+ ++ ++ P+IE A Y +P ++A+I+ NR RMIVG PK V++ L 




Sb j ct : 


236 


QIASSLDLSLIMLENGQLSKGTPSIESALSYPYSPFERARIRENRKRMIVGSPKAVRQQL 


295 


Query: 


294 


DALvNASQAEELLLIPLVPGLDQRIKSLKLLSQ 326 








L A + EE++++ + + RI+S +LL + 




Sb j ct : 


296 


VELARAYETEEVIWTITHRFEDRIRSYELLGE 328 





45 A related DNA sequence was identified in S.pyogenes <SEQ ID 6179> which encodes the amino acid 
sequence <SEQ ID 61 80>. Analysis of this protein sequence reveals the following: 

Possible site: 51 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -2.60 Transmembrane 212 - 228 ( 210 - 229) 

50 

Final Results 

bacterial membrane Certainty=0 .2041 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

55 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 173/329 (52%) , Positives = 241/329 (72%) , Gaps = 1/329 (0%) 



Query: 1 MKLSVLDYGLIDYGKTASDAIQETILLSQEAERLGYHQFWVAEHHGVKAFSISNPELMIM 60 
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MK+S+LDYG+ID KT +A+ ET L+Q A++LG+H+ FWVAEHH + AF+IS+PEL++M 



Sb j ct : 


1 


MW c 3TT>nYflVTnKT3KTPOKAT.TiRTBf'T.AnVAnKT ,nFHRFWVAF,HHNTYAFAISSPEIjTiMM 


60 


Query: 


61 


HLANQTKSIKIGSGGIMPLHYSSFKIAETLKTLETCHPNRVSIGLGNSLGTVKVSNALRS 


120 






HIA+ TK I+IGSGGIMPLHYSSFK+AE + TLE HPNR+ +G+GNSLGT V AL S 




Sb j ct : 


61 


HLADHTKOIRIGSGGIMPLHYSSFKIAEWIMTLEAIjHPMRIDLGIGNSLGTTLVORALSS 


120 


Query: 


121 


LHKAHDYEEVLEELKSWLIDESSSKEPL-VQPTLSSFPDLYVLGSGQKSAYLAAKLGLGF 


179 






+H Y +V+ EL +L + S P+ V P +++P ++ L + ++A LA +LGLG+ 




Sb j Ct : 


121 


TNPTCDClYflnWTRT.YriVTiNPriHTiCiPT-PT WNPPnT^JTVPnTWTTi c iMf : lT.F.TA'ELAC?nT.f}T.GY 


180 


Query: 


180 


TFGVFPFMDKDPLTEAKKLSSLYYHQFEEYYPNKSPffiMVAAFW 


239 






TFG+FP++ KDP+TEAK++S+ Y F K P L++A F+V++DT E+AE +AK L 




Sb j ct : 


181 




240 


Query: 


240 


DIWMLGNKDFJSTEFATFPTIEEANHYQLTPEQKAKIKSNRDRMIVGDPKQVKESLDALVNA 


299 






DIWMLG +DFNEF T+P +EEA +Y LT +Q+ I +NR RM++G P VK+ LD L+ A 




Sbjct: 


241 


DIWMLGQQDFTffiFKTYPDVEEARNYHLTEKQREAIAANRSRMVIGSPHTVKKQLDRLIEA 


300 


Query : 


300 


SQAEELLLIPLVPGLDQRIKSLKLLSQLY 328 








QA+ELL IPLVP R ++L+LL+ LY 




Sb j ct : 


301 


CQADELLAI PLVPEFANRQRTLELLADLY 329 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2000 

A DNA sequence (GBSx2110) was identified in S.agalactiae <SEQ ID 6181> which encodes the amino 
acid sequence <SEQ ID 6182>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2384 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:ARF81345 GB:AC007767 Identical to a glycine cleavage system 

H-protein precursor from Arabidopsis thaliana gb|P25855. 
It contains a glycine cleavage H-protein domain 
PF| 01597. ESTs gb|R90208, gb|AI 
Identities = 30/91 (32%) , Positives = 53/91 (57%) , Gaps = 1/91 (1%) 

Query: 18 TISLTPELQDDLGWGYVEFTD-DANLEVBDVIIiNIEASKTvMAILSPLTGKWKVNTAA 76 

TI +T QD LG V +VE + ++++ + +E+ K ILSP++G+V++VNT 

Sbjct: 59 TIGITDHAQDHLGEWFVELPEANSSVSKEKSFGAVESVKATSEILSPISGEVIEVNTKL 118 

Query: 77 SQEPTLLNSEKADENWLWLTEVDYAB.FEAL 107 

++ P L+NS ++ W++ + A EAL 

Sbjct: 119 TESPGLINSSPYEDGWMIKVKPSSPAELEAL 149 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6183> which encodes the amino acid 
sequence <SEQ ID 6184>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3544 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 80/110 (72%) , Positives = 98/110 (88%) 

Query: 1 MKKIANYLLIEKNEELYTISLTPELQDDLGWGYVEFTDDA^EVDDVILNIEASKTVMA 60 

MKKIANYLLIEK ++ YTIS+TPELQDD+GT+GY EFTD+ +L VDD+ I LN+EASKTVM+ 
Sbjct: 1 MKKIANYLLIEKTDDRYTI SMTPELQDDIGTIGYAEFTDNDHLAVDDI IIjNLEASKTVMS 60 

Query: 61 ILSPLTGKWKVOTAASQEPTLLNSEKftDENWLVVLTEVDYAAFEALENA 110 

+LSPL G W+ N AA+ PTLLNSEKA+ENW+WLT+VD AAF+ALE+A 
Sbjct: 61 VLSPlAGAWERNFJy^TLTPTLI^SEKAEKtWIVVLTDVDQAAFDALEDA 110 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2001 

A DNA sequence (GBSx2111) was identified in S.agalactiae <SEQ ID 6185> which encodes the amino 
acid sequence <SEQ ID 6186>. This protein is predicted to be LRP16 (bl045). Analysis of this protein 
sequence reveals the following: 

Possible site: 17 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0608 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF15294 GB:AF202922 LRP16 [Homo sapiens] 
Identities = 73/171 (42%) , Positives = 98/171 (56%) , Gaps = 13/171 (7%) 

Query: 88 DI CLLQVDAI VNAANSKLLGCF I PNHHC I DNQ I HTFAGSRLRLACHQLMTQQGRMEAVGQ 147 

DI L+VDAIvNAANS LLG +D IH AG L C L + + G+ 
Sbjct: 78 DITKLEVDAI VNAANSSLLG GGGVDGCIHRAAGPLLTDECRTLQSCK TGK 127 

Query: 148 AKLTESYHLPCKYVIHTVGPYVKVDQKPSRIREDLLKSSYKSCLQLAVRANLKTIVFPCI 207 

AK+T Y LP KYVIHTVGP + S+ E L+S Y S L L + L+++ FPCI 
Sbjct: 128 AKITGGYRLPAKYVIHTVGPIAYGEPSASQAAE--LRSCYLSSLDLLLEHRLRSVAFPCI 185 

Query: 208 STGEFGFPNQRAAELAVQAIIiEWQRENQHKL-YIIFNTFTPKDQDIYQKLL 257 

STG FG+P + AAE+ + + EW +++ K+ +1 F KD+DIY+ L 
Sbjct: 186 STGVFGYPCFAAAEIVLATLREWLEQHKDKVDRLIICVFLEKDEDIYRSRL 236 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6187> which encodes the amino acid 
sequence <SEQ ID 6188>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1992 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 139/266 (52%) , Positives = 178/266 (66%) , Gaps = 6/266 (2%) • 

Query: 1 MPNQKQLLLAMIEYLQSEKLTDVDDIj RTTDLQTVWRGLVNQQDPQNI SQEYLSLED 56 

MP+ LL MI LQ+E+LT T Q +WR L+NQ+ +S++YL+LED 
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Sbjct: 1 MPSSFDLLGEMIGLLQTEQLTSSWACPLPNALTKRQDLWRALINQRPALPLSKDYLNLED 60 

Query: 57 RYLSHWWTQKVKTIDVCHQTVYSNVFTYHGDICLLQVDAIVNAANSKLLGCFIPNHHCI 116 

YL W + ++ C +T Y+++F YHGDI L VDAIVNAANS+LLGCF PNH CI 
Sbjct: 61 AYLDDWRASFVPVSVKDCQKTNYTSLFLYHGDIRYLAVDAIVNAANSELLGCFSPNHGCI 120 

Query: 117 DNQIHTFAGSRLRLACHQLMTQQGRMEAVGQAICLTESYHLPCKYVIHTVGPYVKVDQKPS 176 

DN IHTFAGSRLRLAC +MT+QGR EA+GQAKLT +YHLP Y+IHTVGP + S 
Sbjct: 121 DNAIHTFAGSRLRIACQAIMTEQGRKEAIGQAKLTSAYHIiPASYIIHTVGPRITKGHHVS 180 

Query: 177 RIREDLLKSSYKSCLQLAVRANLKTIVFPCISTGEFGFPNQRAAELAVQAILEWQRENQH 236 

IR DLL Y+S L LAV+A L ++ F ISTGEFGFP + AA++A++ +L+WQ E+ 
Sbjct: 181 PIRADLLARCYRSSLDLAVKAGLTSLAFCSISTGEFGFPKKEAAQIAIKTVLKWQAEHPE 240 

15 Query: 237 K--LYIIFNTFTPKDQDIYQKLLLKE 260 

L IFNTFT +D+ +Y L KE 
Sbjct: 241 SKTLTTIFNTFTSEDKALYDTYLQKE 266 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
20 vaccines or diagnostics. 

Example 2002 

A DNA sequence (GBSx2112) was identified in S.agalactiae <SEQ ID 6189> which encodes the amino 

acid sequence <SEQ ID 6190>. Analysis of this protein sequence reveals the following: 

Possible site: 41 
25 >» Seems to have no N-terminal signal sequence 

Final Results . 

bacterial cytoplasm Certainty=0. 2171 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6191> which encodes the amino acid 
sequence <SEQ ID 6192>. Analysis of this protein sequence reveals the following: 

35 Possible site: 41 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2477 (Affirmative) < suco 

40 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 218/284 (76%) , Positives = 250/284 (87%) 

45 

Query: 4 WKTLEKTNHSQSEILSQLIEESDAIWGIGAGMSAADGFTYIGPRFEEAFPDFIAKYQLL 63 

W T + N +Q+E L+QLI +E+DA+WGIGAGMSAADGFTYIG RFE AFPDFIAKYQ L 
Sbjct: 4 WTTYPQKNLTQAEQIiAQLIKEADALWGIGAGMSAADGFTYIGSRFETAFPDFIAKYQFL 63 

50 Query: 64 DMLQASLYDFEDWEEYWAFQSRWALNYLDQPVGQAYLDLKDILAKKEYHIITTNADNAF 123 

DMLQASL+DFEDW+EYWAFQSRFVALNYLDQPVGQ+YLDLK+IL K+YHI ITTNADNAF 
Sbjct: 64 DMLQASLFDFEDWQEYWAFQSRFVALNYLDQPVGQSYLDLKEILGTKDYHIITTNADNAF 123 

Query: 124 AVADYNLEKVFHIQGEYGLWQCSQHCHQQTYRNDQAIRQMIAQQKDMKIPSNLIPKCPKC 183 
55 VA Y+ +FHIQGEYGLWQCSQHCHQQTY++D IRQMIA+QK+MK+ P LIP CP+C 

Sbjct: 124 WAGYDPHNIFHIQGEYGLWQCSQHCHQQTYKDDTVTRQMIAEQKNMKVPGQLIPHCPEC 183 



Query: 184 DQPFEINKRNEEKGMVEDADFHAQRQRYENFLSQHQNDKVLYLEIGVGHTTPQFIKHPFW 243 
+ PFEINKRNEEKGMVEDADFHAQ+ RYE FLS+H+ KVLYLEIGVGHTTPQFIKHPFW 
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Sbjct: 184 EAPFEINKRNEEKGMVEDADFHAQKARYEAFLSEHKEGKVLYLEIGVGHTTPQFIKHPFW 243 

Query: 244 RFVSLNENSLFVTLNHKHYRIPQKIRSRSVQLTQHIAELIAEAK 287 

+ VS N N+LFvTUlJHKHYRIP IR +S++LT+HIA+LI+ K 
Sbjct: 244 KRVSENPNALFVTLNHKHYRI PLS IRRQSLELTEHIAQLI SATK 287 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2003 

A DNA sequence (GBSx2113) was identified in S.agalactiae <SEQ ID 6193> which encodes the amino 
acid sequence <SEQ ID 6194>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1086 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12865 GB:Z99109 similar to lipoate-protein ligase [Bacillus subtilis] 
Identities = 130/331 (39%) , Positives = 206/331 (61%) , Gaps = 5/331 (1%) 



Query : 


9 


NGKRITDGAIALAMQVYILQNVFLDDDILFPYYCDPKVEIGKFQNAVIETNQEYLKEHDI 


68 






+ + I D I IA++ Y ++++ + L Y P + IGK QN + E N +Y++E+ I 




Sbjct: 


5 


DNQNINDPRINLAIEEYCVKHLDPEQQYLLFYVNQPSI I IGKNQNTIEEINTKYVEENGI 


64 


Query: 


69 


PWRRDTGGGAVYVDSGAVNI CYLMKDHGQ- FGDFKRAYEPAIKALKTLGASSVEMRERN 


127 






WRR +GGGAVY D G +N ++ KD G F +FK+ EP I+AL LG + E+ RN 




Sb j ct : 


65 


IVVRRLSGGGAVYHDLGNLNFSFITKDDGDSFHNFKKFTEPVIQALHQLGVEA-ELSGRN 


123 


Query: 


128 


DLVIDGKKySGAftMTIWGRlYGGYSLLLDVDFDAMEKVLNPNRKKIESKGIKSVRSRVG 


187 






D+V+DG+K+SG A GRI+ +L+ D D + L + KIESKGIKS+RSRV 




Sbjct: 


124 


DIWDGRKISGNAQFATKGRIFSHGTLMFDSAIDHWSALKVKKDKIESKGIKSIRSRVA 


183 


Query: 


188 


DIRSHLSEDYRHITTDQFKDLMVCQLLHIDHIDQAKRYHLTEH3WARIDALADEKYKNWD 


247 






+1 L + +TT++F+ +++ + + + Y LTEKDW I ++ E+Y+NWD 




Sb j ct : 


184 


NISEFLDDK- - -MTTEEFRSHLLRHIFNTNDVGNVPEYKLTEKDWETIHQISKERYQNWD 


240 


Query: 


248 


WNYGNSPQYSYHRDARFPSGTYDFHLEIEKGIITNCRIYGDFFSSKDISDIENLLIGCPM 


307 






WNYG SP+++ + R+P G+ D HLE++KG I +C+I+GDFF D+S+IENLL+G 




Sbjct: 


241 


WNYGRSPKFNLNHSKRYPVGSIDLHLEVKKGKIEDCKIFGDFFGVGDVSEIENLLVGKQY 


300 


Query: 


308 


KEELVLEKLSTLSLEDYFGQTSPEEIKAVLF 338 








+ ++ + L ++L+ YFG + E+ +++ 




Sbjct: 


301 


ERSVIADVLEGVNLKHYFGNITKEDFLDLIY 331 





A related DNA sequence was identified in S. pyogenes <SEQ ID 6195> which encodes the amino acid 
sequence <SEQ ID 6196>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0939 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 248/339 (73%) , Positives = 283/339 (83%) 



